knitr::opts_chunk$set(warning=FALSE)

Needed libraries

library(dplyr)
library(countrycode)
library(outliers)
library(caret)
library(cluster)
library(factoextra)
library(NbClust)
library("DMwR")
library("RWeka")
library("C50")
library("rpart")
library("themis")
library(rattle)
library(rpart.plot)
library(RColorBrewer)

phase 1

Problem statement

Prediction of cyber security employees’ salaries based on 11 attributes

1.work_year

2.experience_level

3.employment_type

4.job_title

5.salary

6.salary_currency

7.salary_in_usd

8.employee_residence

9.remote_ratio

10.company_location

11.company_size

Problem description

We are living in the “information age” or rather the “data age”, meaning that everything around us revolves around data. The data has become one of the most valuable assets that a person or an organisation can have, since it has a significant value, losing it will lead to significant damages. Thus, most of the attacks nowadays are directed toward the data. To guard against such damages, organisations have realised the importance of protecting their digital assets, leading them to hire cybersecurity specialists. This made cybersecurity gain popularity among people so there’s a growing tendency to study cybersecurity. Consequently this resulted in the emergence of plentiful professionals with various experience levels and skills in this field. As a result, organisations may find it difficult to decide a salary for job candidates solely based on the CV. also, since the attacks improve rapidly, organisations need to hire more employees in the far future to defend against such attacks but it’s not an easy matter to predict the future payroll which may hinders some of the organisation’s plans. Another issue arises when the decision makers in the organisation aren’t fully aware of the trends on salary. Their lack of awareness gives a chance for the competitor organisations to attract their employees to them by offering a better salary that match current trends

Data mining task

Prediction of the cyber security employees’ salary categories (Very Low, Low, , High, Very High) using classification methods.

Goal

Given the problems we discussed and In order to better understand this field, we decided to analyse a dataset of 1247 cybersecurity employees, containing information such as salary, job title, and experience level. Analysing this dataset can provide insightful predictions regarding the salary range of a cybersecurity employee, which can help in

  • Making better decisions
  • Making recruitment and hiring process easier and more efficient
  • Predicting the future payroll
  • Increasing loyalty
  • Increasing the satisfaction rate
  • Achieving fairness

Source of data:

https://www.kaggle.com/datasets/deepcontractor/cyber-security-salaries

Reading and viewing dataset

dataset= read.csv(url("https://raw.githubusercontent.com/SarahAlhindi/DM_project/main/Data%20Set/salaries_cyber.csv"), header=TRUE)
View(dataset)

Original dataset

we will keep a copy of the original dataset before data preprocessing to use if needed at any time

originalDataset= dataset

General information about the dataset:

No. of attributes: 11
Type of attributes: Ordinal , Nominal, and Numeric
No. of objects: 1247
Class label: salary_in_usd

ncol(dataset)
nrow(dataset)
names(dataset)
str(dataset)

Attributes’ description table

Attribute Name Description Data Type Possible values
work_year The year in which salary was recorded Numerical 2020 to 2022
experience_level Expertise level of the employee Ordinal En “Entry level”
MI “Mid level”
SE “Senior level”
EX “Executive level”
employment_type The nature or category of employee’s engagement in the job Nominal PT “Part time”
FT “Full time”
CT “Contract
FL”Freelancer”
job_title The role worked in during the year Nominal

Different titles.

like Security Analyst, security researcher

salary The total gross salary amount paid Numerical 1740-50001566
salary_currency The currency of the salary paid to the employee Nominal

Different currencies according to ISO 4217 currency code.

like DE,CA

salary_in_usd The salary paid in United states dollar Numerical 2000 to 365596.40
employee_residence Employee’s primary country of residence Nominal

Different countries.

like US,AE

remote_ratio Percentage of online work by employee in the specified year Numerical 0 “No remote work”
50 “Partially remote”
100 “Fully remote”
company_location The country of the employer’s main office Nominal

Different countries.

like BR,BW

company_size How big/small is the company Ordinal S , M or L

phase 2

sample of 20 employees from the dataset:

using sample_n(table,size) function and using (set_seed())

set.seed(30)
sample=sample_n(dataset,20)
print(sample)

Show the missing value:

if it is FALSE it means no null value,if it is TRUE there is null value. In our dataset there is no null values.

is.na(dataset)
sum(is.na(dataset))

Show the Min.,1st Qu.,Median,Mean ,3rd Qu.,Max. for each numeric column

in work_year Most data falls within the years 2021, with some in 2020 and 2022. in salary Salaries vary widely, with a high average and an exceptionally high maximum. in salary_in_usd Represents salaries in USD, ranging from a minimum to a maximum. in remote_ratio Indicates the percentage of remote work, with a median and 3rd quartile at 100%, but a mean slightly below, suggesting some variability.

summary(dataset$work_year)
summary(dataset$salary)
summary(dataset$salary_in_usd)
summary(dataset$remote_ratio)

Show the variane of each numeric column

variance is to understand the spread or dispersion of the values in each column. A higher variance indicates that the values are more spread out from the mean and in our dataset the highest attribute is salary, while a lower variance indicates that the values are closer to the mean which in our datas it is work year attribute.

var(dataset$work_year)
var(dataset$salary)
var(dataset$salary_in_usd)
var(dataset$remote_ratio)

Visualization of relationship between some pairs of attributes:

Here we used boxplot to see the distribution between salary_in_usd and experience_level We observed that salaries vary depending on the level of experience,they are positively correlated.

boxplot(salary_in_usd ~ experience_level, data = dataset , yaxt="n")
labels<- pretty(dataset$salary_in_usd)
labels<- sapply(labels, function(x) format(x, scientific = FALSE))
axis(side = 2, at=pretty(dataset$salary_in_usd), labels = labels )
options(scipen = 999)

Here we used boxplot to see the distribution between salary_in_usd and work_year We observed that 2021 salaries were close to each other but in 2022 the gap between them getting bigger.

boxplot(salary_in_usd ~ work_year, data = dataset , yaxt="n")
labels<- pretty(dataset$salary_in_usd)
labels<- sapply(labels, function(x) format(x, scientific = FALSE))
axis(side = 2, at=pretty(dataset$salary_in_usd), labels = labels )
options(scipen = 999)

Here we used boxplot to see the distribution between salary_in_usd and employment_type We observed that Full Time (FT) offers more salary than the other categories.

boxplot(salary_in_usd ~ employment_type, data = dataset , yaxt="n")
labels<- pretty(dataset$salary_in_usd)
labels<- sapply(labels, function(x) format(x, scientific = FALSE))
axis(side = 2, at=pretty(dataset$salary_in_usd), labels = labels )
options(scipen = 999)

Here we used boxplot to see the distribution between salary_in_usd and company_size We observed that the larger the company is the higher the salary was.

boxplot(salary_in_usd ~ company_size, data = dataset , yaxt="n")
labels<- pretty(dataset$salary_in_usd)
labels<- sapply(labels, function(x) format(x, scientific = FALSE))
axis(side = 2, at=pretty(dataset$salary_in_usd), labels = labels )
options(scipen = 999) 

Data Reduction

Dimensionality Reduction

The “salary” column gives the same information as “salary_in_usd” it’s just a matter of currency exchange, and we will eventually transform all the values in “salary” column to one common currency so we can properly deal with them. To further confirm that the two column are redundant, we will use the latest exchange rate for USD to the desired currency.

we will start by creating a temporary column named “converted_salary” to save the salary that we will get by using the exchange rate to convert the salary_in_usd to the salary with different currencies to compare with “salary” column

convertedDataset=dataset


convertedDataset$exchange_rate = factor(convertedDataset$salary_currency, levels=c("USD","BRL","GBP","EUR","INR","CAD","CHF","DKK","SGD","AUD","SEK","MXN","ILS","PLN","NOK","IDR","NZD","HUF","ZAR","TWD","RUB"), labels=c(1/1,1/0.20,1/1.22,1/1.06,1/0.012,1/0.74,1/1.10,1/0.14,1/0.73,1/0.64,1/0.090,1/0.057,1/0.26,1/0.23,1/0.093,1/0.000065,1/0.60,1/0.0027,1/0.053,1/0.031,1/0.010))
convertedDataset$exchange_rate = as.numeric(as.character(convertedDataset$exchange_rate))
convertedDataset$converted_salary = convertedDataset$salary_in_usd*convertedDataset$exchange_rate



set.seed(1)
salary_sample <- sample_n(convertedDataset[,c("salary","converted_salary")],10)

print(salary_sample)

as shown in the sample, the two columns are almost identical. This can be proved by correlation coefficient as well.

correlation <- cor(convertedDataset$salary , convertedDataset$converted_salary)
print(correlation)

The correlation is so high but it hasn’t reached 100% possibly due to rounding in the calculations and slight differences in the exchange rate over time.

To make the mining process more effiecent and has an improved quality, we decided to remove the “salary” column.

dataset<-dataset[,-c(5)]

Find the outliers and remove them:

We will show outliers with boxPlots and then remove them, to minimize noise and to get better analytical results when applying data mining techniques.

now we show (salary_in_usd) attributes’ outliers. we can see that there are many outliers with exceptionally high values, thus we will remove them.

boxplot(dataset$salary_in_usd)



OutSalary = outlier(dataset$salary_in_usd, logical =TRUE)
Find_outlier = which(OutSalary ==TRUE, arr.ind = TRUE)
dataset= dataset[-Find_outlier,]

now we show (remote_ratio) attributes’ outliers. we can see there aren’t outliers in remote_ratio, thus we don’t need the last step i.e: removing outliers’ rows.

boxplot(dataset$remote_ratio)

now we show (work_year) attributes’ outliers. we can see there aren’t outliers in work_year, thus we don’t need the last step i.e: removing outliers’ rows.

boxplot(dataset$work_year)

Concept hierarchy generation for nominal data

the columns “company_location” and “employee_residence” have the name of countries for the company and employee respectively. And these attributes can be generalized to higher-level concept that is region to help understand and analyze the dataset better and improve algorithm performance.

We will use the 7 regions as defined in the World Bank Development Indicators. These regions are:

  1. East Asia and Pacific: This region includes countries like China, Australia, Indonesia, Thailand, etc.

  2. Europe and Central Asia: This region includes countries like Germany, UK, Russia, Turkey, etc.

  3. Latin America & Caribbean: This region includes countries like Brazil, Mexico, Argentina, Cuba, etc.

  4. Middle East and North Africa: This region includes countries like Saudi Arabia, Egypt, Iran, Iraq, etc.

  5. North America: This is predominantly United States and Canada.

  6. South Asia: This region includes countries like India, Pakistan, Bangladesh, Sri Lanka, etc.

  7. Sub-Saharan Africa: This region includes countries like Nigeria, South Africa, Ethiopia, Kenya, etc.

Note: UM(The United States Minor Outlying Islands) and AQ(Antarctica) don’t belong to any of these regions, thus, they will be used as they are.



um=which(dataset$company_location=="UM")
aq=which(dataset$company_location=="AQ")


dataset$company_location <- countrycode(dataset$company_location, "iso2c", "region")
dataset$employee_residence <- countrycode(dataset$employee_residence, "iso2c", "region")

dataset[um,"company_location"]="UM"
dataset[aq,"company_location"]="AQ"

Concept hierarchy generation can be done for “job_title” as well to improve interpretation and scalability. Also, most job titles are essentially the same job but with different names, so we can combine them into a higher-level jobs titles such as Architect, Analyst and Engineer etc.

## Create the categories based on job rank 
dataset$job_title <- ifelse(grepl("Analyst", dataset$job_title), "Analyst",
                                ifelse(grepl("Architect", dataset$job_title), "Architect",
                                       ifelse(grepl("Engineer", dataset$job_title), "Engineer",
                                              ifelse(grepl("Manager|Officer|Director|Leader", dataset$job_title), "Leadership",
                                                     ifelse(grepl("Consultant|Specialist", dataset$job_title), "Consultant/Specialist",
                                                            ifelse(grepl("Cyber", dataset$job_title), "Cyber Security",
                                                                   "Others"))))))

Encoding categorical data

To deal with columns with character type we are going to encode them, because most machine learning algorithms are designed to work with factors data rather than character data and it improves performance and Interpretability of data as well.

dataset$job_title  <- factor(dataset$job_title)

dataset$experience_level = factor(dataset$experience_level, levels=c("EN", "MI", "SE", "EX"), labels=c(1,2,3,4))

dataset$employment_type  <- factor(dataset$employment_type)

dataset$employee_residence  <- factor(dataset$employee_residence)

dataset$company_location  <- factor(dataset$company_location)

dataset$salary_currency  <- factor(dataset$salary_currency)

dataset$job_title  <- factor(dataset$job_title)


dataset$company_size = factor(dataset$company_size, levels=c("S","M","L"), labels=c(1,2,3))


dataset$job_title  <- factor(dataset$job_title)

Discretization of salaray_in_usd attribute

by calculating breaks based on quartiles

breaks <- quantile(dataset$salary_in_usd, 
                   probs = c(0, .25, .5, .75, .95, 1), 
                   na.rm = TRUE)


dataset$salary_in_usd <- cut(dataset$salary_in_usd, 
                                       breaks = breaks, 
                                       include.lowest = TRUE, 
                                       labels=c("Very Low", "Low", "Medium", "High", "Very High"))

Normalization:

to change the scale of numeric attributes (remote_ratio and work_year) to a scale of [-1,1] to give them equal weight

dataset [, c("work_year" , "remote_ratio")] = scale(dataset [, c("work_year" , "remote_ratio")])

Feature Selection

we will implement feature selection to remove redundant or irrelevant attributes from the data set to get the smallest subset that can help us get the most accurate predictions for our target class(salary_in_usd) and decrease the time that it takes the classifier to process the data.

we will use RFE(Recursive feature elimination) which is a wrapper method for the feature selection. Since the RFE function have multiple control options we need to specify the options that we want. We will choose “Random Forest” because it has high accuracy, can handle categorical data.

control <- rfeControl(functions = rfFuncs, 
                      method = "repeatedcv",
                      repeats = 5, 
                      number = 10)

First we save the features to be used in the feature selection(every attributes except the class label “salary_in_usd”) in variable x, and the class label in variable y. Then split the data to 80% training and 20% test.

x <- dataset %>%
  select(-salary_in_usd) %>%
  as.data.frame()

# Target variable
y <- dataset$salary_in_usd

# Training: 80%; Test: 20%
set.seed(2021)
inTrain <- createDataPartition(y, p = .80, list = FALSE)[,1]

x_train <- x[ inTrain, ]
x_test  <- x[-inTrain, ]

y_train <- y[ inTrain]
y_test  <- y[-inTrain]

after splitting the data, now we can perform the selection using rfe

set.seed(1)
result_rfe1 <- rfe(x = x_train, 
                   y = y_train, 
                   sizes = c(1:9),
                   rfeControl = control)

result_rfe1

predictors(result_rfe1)

The results show that all the remaining attributes, except for “employment_type”, are selected. This is logical, as 98% of the rows have the value “FT”, as shown in the table below. Due to the low variance, we decided to remove this attribute.

table(dataset$employment_type)
dataset<-dataset[,-which(names(dataset)=="employment_type")]

phase 3

dataset2= read.csv(url("https://raw.githubusercontent.com/SarahAlhindi/DM_project/main/Data%20Set/preprocessedDataset.csv"), header=TRUE)


char_vars <- sapply(dataset2, is.character)
dataset2[char_vars] <- lapply(dataset2[char_vars], as.factor)

balancing data

To resolve the problem of class imbalance in the dataset, we will use SMOTE() method that oversample the minority class by creating synthetic samples using the existing minority class samples

data_balanced <- SMOTE(salary_in_usd ~ ., dataset2, perc.over = 300, perc.under=500, k = 10)

Classification

The goal of all preceding steps is to properly prepare the dataset for the classification phase, which constitutes one of our primary mining objectives. In this section, we will employ various attribute selection methods such as the Gini index, Gain ratio, and information gain to construct a decision tree model. We will thoroughly evaluate its performance, and if it proves effective, it can subsequently be utilized to classify new instances with unknown class labels.

since our dataset is small, we decided to use K-fold Cross-validation. for each attribute selection method we will try different K size (10,5, and 3).

in all this section we will be using train and trainControl functions of caret package to produce decision trees. for Gini index the method will be “rpart” and for Gain ratio it’s “j48” as for information gain the method is “C5.0”.

the following function will be used to compute average sensitivity and Specificity:



macro = function(matrix){
  
  sumSen=0
  
  for (i in 1:5) {
   sumSen = sumSen + matrix$byClass[i,1] 
  }
  
  
  avgSen = sumSen/5
  
  sumSpec=0
  
  for (i in 1:5) {
   sumSpec = sumSpec + matrix$byClass[i,2] 
  }
  avgSpec = sumSpec/5
  
  
  
  
  sumPrec=0
  
  for (i in 1:5) {
   sumPrec = sumPrec + matrix$byClass[i,3] 
  }
  avgPrec = sumPrec/5
  
  
  
  
  avgs = data.frame(Sensitivity=avgSen , Specificity=avgSpec, Precision=avgPrec ,Accuracy= unname( matrix$overall[1]) )
  print(avgs)
  
  
}

Gini index

Gini index measures the impurity of the dataset. The partitioning that yields the most substantial reduction in impurity is selected as the splitting attribute. To apply the Gini index, we will employ the “rpart” method, which utilizes the Gini index as the criteria for splitting.

10 Folds
set.seed(10)
ctrl <- trainControl(method = "cv", number = 10, returnResamp="all", savePredictions="final")

giniIndex10 <- train(salary_in_usd ~ ., data = balanced_dataset, method = "rpart",trControl = ctrl)

prp(giniIndex10$finalModel, box.palette = "Reds", tweak = 1.2, varlen = 20)

caret::confusionMatrix(giniIndex10$pred$obs,giniIndex10$pred$pred)
5 Folds
```r
set.seed(10)
ctrl <- trainControl(method = \cv\, number = 10, returnResamp=\all\, savePredictions=\final\)

giniIndex10 <- train(salary_in_usd ~ ., data = balanced_dataset, method = \rpart\,trControl = ctrl)

prp(giniIndex10$finalModel, box.palette = \Reds\, tweak = 1.2, varlen = 20)

<!-- rnb-source-end -->

<!-- rnb-plot-begin -->

<img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAABoEAAAQFCAMAAABJpZH8AAABfVBMVEUAAAAAABEAAB0AADoAAGYAGBEAGB0AGCoAKzUAOjoAOmYAOpAAZrY5AAA5GAA5GBE5GB05Kx05Kyo5KzU5PCo5PDU5PD86AAA6OgA6Ojo6OmY6ZmY6ZpA6ZrY6kLY6kNtkAABkGABkGBFkKwBkPCpkPDVkSzVkSz9kS0pmAABmOgBmOjpmOmZmZgBmZjpmZmZmZpBmkJBmkLZmkNtmtttmtv+NGACNKwCNKxGNPBGNPB2NSyqNSz+NWz+NW0qQOgCQZjqQZmaQkGaQkLaQtraQttuQ27aQ2/+zKwCzKxGzPBGzPB2zPCqzWyqzWz+zW0qzajWzakq2ZgC2Zjq2kDq2kGa2kJC2tma2tra2ttu229u22/+2///XPBHXPB3XSx3XSyrXWyrXWzXXW0rXajXXaj/XakrbkDrbkGbbtmbbtpDbtrbb25Db27bb29vb2//b///7Sx37Wyr7WzX7Wz/7ajX7aj/7akr/tmb/25D/27b/29v//7b//9v////HWNPrAAAACXBIWXMAACToAAAk6AGCYwUcAAAgAElEQVR4nO3d/aPkVnkn+DKNjW3sQMLGYF4CzoTBxLMThsWJZ5YNHmY809nswqQ9yWQgHliz617YdaAN426adP/tc6tKUh29q+qW9Ojl8/nB7nvrlko6Onq+0pFK2j0FgAi76BkAYKMkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQjOrebu+T/xA9H2XjztW1pv7kb9/4/M2EXv1XP77GXDFDEghGJYEu9eTdXeHLMmidJBCMSgJd6Lef3yXu/McrzRuzIoFgVBLoMv/0zd1OBK2eBIJRSaDbTCMxtybkGiQQjEoC9Xn89pv1X5bH4A6+et2ZZA4kEGzQfBLowz/d7RoSKDsE+vSPnz79xZ9m//7VteeTcBIINmgmCfTk54dDnXoCZWeBXkp/mNtxJFcggdiAJ393/F7Jt3509lsfv33zzjv1i4F7Jvn4jcZ3DZqbJ7+4efPuhfq7H3//1X0lfuFL71SPBs5ewHpGXNZGze8amEC/ezcbaqsn0HEQLr/64J5rEdZKArF66fdKXnwn/+X3Kue37yeXXGUvvpm8tZwHjZPM33cziZ+fTl1Uq3HLW7O/u3nDz4szIF8uBU0+GHX4iD9KX2qbZIdBc3W/WIjC/VIWdC9LTwJ9+Ebx3oZRuH/8++//yy/k427zPJfGFUgg1u5xUrlvfCWravmp7pfKPx7KbZ5AeUylVbdjkkUCPUpKa6V6tr21SKCkqJdq7v3S+3Yv/kPv3HQZNFfZ6Fdy/iVrkOw3PcvSmRg//0LyzoYESpU/lTWRQKxc9XsleeQ8fZSWvzxrjlUu++k730vfdyqorZPMEug/fDN5S7kat741T6DyRcinovtu5X2nl9on2WHYXNVGvx4lTda7LO0J9OTd9Eq33sO2n6efyqpIINbtyfd2Vfmo0r1TTOSlNSu2DW9Kan77JKuvfPVppRr3zs2rlVfzqnu/9r683ndMssOwuXpUnV76tt5laUugx2+n7+m9386H2Z87BFojCcT06tX0Uv2flZXu/X72k/98/He+T5/vw790+mctSV780U3FzH96s2+SlaJ8+G2pGnfMTXHwc3gxP+SpjBHuy/Xv3i29sWOSHYbNVXX4q3SFWu+yNCdQejprd+fbPbFyatFBAbS71IBpMwYtz8QuLhIXlY7yqYzKPv1pHC7/9kn1KOel4y9KL3dMspJAhz9Jq3HX3NxL31T5yPyn7E+z2v9S7wK2GzhX5QsPsteOP/YvS0MCPfnbs4bfkpG+IWe3drfSP32uT7MzqdsVifMrR6WEHn8sSmNe5rMTDcWfVS+Uy34+vt4xySKB7nznpmD+7lenzzi+3jU398ofmR31NGVE6fsx3QvYauBcZXNRPjQ8zkb/stTnonTmaNDtrvODvwEB1NtReg2YH65MozOlK2/q/VM71sLiFHlW0PKzK5Vz6cWhQ54kxS/SXfyOST6pjNed/vxYjbvmJkug8kUN2fseVaZ674Uvfuvv//FXfZPsbZehc5VlSSmPBr8rcWrvQ0QP8NuWS9NrbtezRFAUbc6Urr6Z9xSOrORVo6WaLEenMw2lY55kOi/1TDJPoPRqtKQad87NvWrKJFW8/cimbwHbDJ2rUvalhz0DlqU9gb44+FuvyQrqPrt16wQRQSE0ORMaYyPvnGb1kCCrk6eEuNdY4UoHIMkv9hnVNck8gdJDkKQad85N9crn+lFK07n43gVsMXSu0msPSi8MWJb2BOq9AKHw+B9PJ4+6xhav0bNEUAAtzoRG2cS7Jlo6xjk51fJkHC45bqh/BfJUU7smWTt2Kr2ze246EqjjG5m9C9hi6Fylh1+l0BmwLJ3ngc566Gn2YR25urtGz7rKRDiLFmc642zhXVPtL9CPGn7XcCRxVgKVSu/CEyiJnVKu3DaBBt8/KJndjnG4nQRaJi3OdMbZwrumOqBA5+Nw6chZXAKd3lo/UxOSQKemKJ/5uSiB8nthZ4ZejjDgAoudBFomLc50xtnCu6baUieT2lhcbFW6T/T5o3CtByv9tT4913OlBOq9HHvoXOUv3/yUfhnozGVJlb+R+pVf9szoUemrsE12EmiZtDjTGWcL75pq9TLmmuQ7pC/VftuVQI2THJZAzXPTkUD9o3Bn3zJt6FwVxT//1m4+E2ctS1n5rjxfHHJCSAKtlRZnOtMnUO/oTXq/tdNfnXMtXKongTrnpiOBOir60O//dHxa3ySyL/4USTTgg3vvTNp7a4QP//7f/stXP19cHJJ9mgRaHS3OdKZPoL5959+mlbAWOKf6Wv8+UOMkexKoc266Eqhy/4Gb2X71z/7df/3lgAVsM3SunuZt9Mn/Up6/s5alQWkwrpZj1RNxvXcbkkALpcWZzvQJ1PNkmcp93F6q/r6pAnZNsieBOuemK4GqBfj086WPzhk6V8Xrd77Q2ETDlqXR464n1FUufqt+XbdGAi2UFmc60ydQPspWPOTgzhe/9e9/9MvKq9n+/anE5wlUPAu0fl+4xkn2JFDn3HQlUHojuNNrh+l0L2CroXOVvF7JgHOWpUXHU7qzqZe+lORq7BXS4kwnIIGyYbasdmWVPK9kvy0K4L3GxKlWwONPHZPsS6CuuelKoOKmcaUbUR8+pnMB2w2dq+Q3lSU7Z1laPTk+KrWeQPln7p+O8bvKoyqaSKCF0uJMJyCBimzZ3wfmF58v1dFkrC0veJVXjmfJi2u33uybZF8Cdc1NZwKlBTl/GE/5fnJNk+xtlv65Sl/flU/EnLEsXT7808bRtYZH8nXlqgRaKC3OdCISqPYo6VMlS8d28q+3ND7nZ1cuyu2T7E2gjrnpTKCmgpx9SsckOwydq4NHjb89Y1m6PX67IYEapt51u9WdBFomLc50ylv4/VKpvl9UrMdv70dmXi3dvPL4uxe+3HRP5Z66Ubrc7eDN8gvlQ4l0NO2F0ltPBbV1kr0J1P7WngR6+m71fcUr7ZPsMHSuDoo0KA+DDV+WS9Sm/pWuv95VOsH9w8w++dsv7L/1mqyOD48d6duNZ8qqE2F8WpzplLfwR2mN2he5Q/06fV3xznfyF5+cym/D+FJf3aiUsnyy+XFO5VAivbvOpz9MdsQ/+ePeSQ5IoNa39iVQ9SjoxdPstE6yw9C5Kn92JdoGL8tFKlPvfuDErtIJbubgq8W3jooZeVxcAn7nfxkwEcanxZlOeQu/qTCnUZ375RPr5YKXPEKhIYJ668aTt5sqd7WqpuNwRZKc3lp+QFrzJIckUNtbexPo6W/Tb9CUnhnaNskOQ+cq++iWxyMMXZbLJDseuxd7BhYrneBmRdx5p3a7i0dp52pItJ0EmpwWZzrlLbw47Hl6LHFvZv/f7cfaDrewTG7D/Ol9eduftK5XjgF148nfvbEffLnz6mn0JY+62qOC9qmYJMmHb9/84Z36swQaJjksgVre2p9AN7vw3391/7sXvvRONYabJ9lh6FyVlquhaA9blkvdTH2/ml79Vm+qVjrBfu1+4XjXucffLF3zd+hch4Oj+lilBJqeFmc65S18X9aSo5yXsl/lYzn7ylGcokmuUasdBI1QNy79nidRKp3gcED76R/n/z4ca+/3ePLOtQ+jC46muTotznQqW/i9Yq86H5C7n+6aFlcq3DsdqTxq2LOWQFQ7wf1k7bV1rvohmgSanhZnOvUykSXLvdMt106jYjeV4/Sd/T9qDwMJRMPhdXGOMdtrqXWu+jCcBJqeFmc69aGSY43Pj2welb5xUlypcBi/v/PttjiQQDScYizSJtvRuV+7z4MEmgEtznTqp4uLW2MehuPSa96KqwKeFpetfaU5ECRQk4avsFZadVV21cssi3zJe9e90teZJNBMaHGmU9nCb6pAfj+C9DEBieIbk8cLkZuPgyRQk00nUHqWJ8uaf/pm6Vo+CTQTWpzp7Gpf2shrw6EYtCdQ8TiZpi+7SKAmm06ge+UHa+SngZLIedTUDBJoelqc6VS38OPFcMU1b6WvqNYcb5ZwwT0RLiCBlqbUCYph3b37xaNt0+V2LdxMaHGmU93C7+8LxWnntDuBsnuqTPI9Qgm0NKVO0HQaqNy5yhfGNU6ESWhxplPdwh/ty8DpmuzyQEk2kvKoMqQ/Sd1YQQJtTKkTVPtMfreN8uP13BNhDrQ406lu4TdV4dP//2nfdF/3S8/FfvNpZcD+nj1XGpU6QXrX9bz/lDpX8y0R9KQAWpzpVLfwm/3TT/7P5W9u5Hum+9vCHX6/LxYvJQ/mqd+bTN2g1AlKYVOkUdK59n2qaShST5qeFmc61S38ONhVfkTD4Yrrwy1Ik8eOvri/FWd6s9KuqbJBaScojeYWh82HKy0Ptyo9XNLS9LAHPWl6Wpzp1Lbwe5VaUDyMc1e6rVei4eFr6galTpBedJCkUdq5mp82pCdNT4szndoWfn9XOao5xc3pBginx4o1PyVG3aDUCe6XH71XpNGpc7U8bUhPmp4WZzq1LTy/3ODkyS/e2NWeonx8tPKdL70zbKpsT9oJ7jWdBto7dq47jc96r06EaWhxptN0DHT7K57VDa7TCfSk6WlxptNwLVzTiZ1bTpUNkkALpcWZTnULv9/09Z5bT5UNkkALpcWZTmUL77sLz2VTZYsk0EJpcaaTbOFPjpcmNV4Ue/lU2SoJtFBanOmctvBr3nhN3UACLZUWZzrVBHqxfoODW02VzZJAC6XFmc5pC/+n77U98fQ2U2WzJNBCaXGmsxtlE19O3diN0wDj281+ziXQQmlxJjTGJj732ngy+zLebvYRdI25m/kirpMWZ0IjbOPLKRtzL+LdZp5BV5i5WS/famlypnT1rXw5ZWM5c9pi3hl065mb88KtmDZnUvndiec1qfEtZ07bzbu9dxf3h8vfya1pdCa1u7roJRpkOXPaadYtvo2etDZanYndulAssGwsZ077zLrRN9CTVkezE+GWsbOskrGome0197ZfcUdaI40P41pbkVO1uR4dCUa1wnotg7gW3QjGtM5aLYO4Dp0IxrPaQi2CuAp9CEaz5jItg7gCPQjGsvIaLYO4Nf0HxrH++uxiZm5L74FRbKI4iyBuR+eBMWylNMsgbkPXgRFsqCzLIC6n48DVbasmiyAupt/AtW2uIssgLqTXwJVtsRzLIC6iz8B1bbMUuzKbS+gxcE3brcMiiPPpMHBFm67CMohz6S5wPVsvwTKI8+gscC3KrwjiPPoKXIniuyeDOIOeAteh8mZkEIPpJ3AVqm7BldkMpZfAFSi5JSKIYXQSuD0Ft0oGMYQuArem2jaQQfTTQeCWVNpmIohe+gfcjjrbSgbRQ++AW1Fku8ggOukbcBsKbDcRRBddAy6nvPaTQbTTMeBiausgMog2ugVcSmEdyE0SaKFTwGUU1TOIIBrpE3ARJfU8MogGegRcQj09mwyiRn+AC6ilFxBBVOkOcDaV9EIyiDKdAc6ljF5OBpHSFeBMauhtuDKbhI4AZ1E/b0sEUdAP4Byq5xXIIDJ6AZxB6bwOGcSBPgDDKZvXIoLY0wVgKEXzmmQQEggGUzGvTAZh9cMwyuXViaDNs/ZhCLVyFDJo46x7GEChHIsM2jRrHvqpkuNxk4Qts96hlwo5KhG0XVY79FAfRyeDtspKh26K4xRk0DZZ5dBJZZyGCNokaxy6KIuTkUEbZH1DOzVxUjJoc6xtaKUgTsyV2VtjXUMb1XB6ImhbrGpophTGkEFbYkVDI3UwjAzaDqsZmiiCgUTQZljL0EAFjCWDNsI6hhrlL54M2gRrGKrUvjlwZfYWWL9QofDNhAhaP6sXSlS9GZFBa2flQkrJmxcZtG5WLSTUu7kRQatmzcKJYjdDMmjFrFfIqXQzJYNWy1qFjDI3WyJoraxUOFLk5kwGrZNVCnsK3NzJoDWyQuGpAFoCN0lYIasTBNBCiKDVsTbBRdiLIYNWxrpk8xS1JZFBq2JNsnUq2rKIoDWxItk49WxxZNB6WI1smlq2SDJoLaxEtkwhWyhXZq+EVciGqWLLJYJWwRpku5SwRZNBK2D9sVXq1+LJoMWz9tgoxWsFRNDSWXlsk9K1DjJo2aw6tkjZWg8ZtGRWHBukZq2JK7MXzGpjexSslRFBi2WtsTmq1frIoIWyztgYpWqdZNAiWWNsizq1ViJoiawwNkWVWjEZtDxWF1uiQq2bDFoaK4vtUJ5WTwQtjHXFZihOWyCDFsWaYitUpo2QQQtiPbENqtJ2uEnCclhLbIKStCkiaCmsJLZAQdoaGbQMVhEboBhtkAxaAiuI1VOJtkkELYD1w9qpQ5slg2bP2mHlFKEtk0EzZ92wagrQxrkye96sGdZM9UEEzZkVw4qpPTyVQXNmtbBe6g5HMmiurBTWStGhIIJmyjphpZQcUjJolqwR1km9oUIGzZD1wRqpNdSJoPmxOlghlYZGMmhurAzWR5mhjQyaF6uC1VFiaOcmCbNiRbAy6gvdRNCMWA+si+pCLxk0G9YCq6K0MIQMmgnrgBVRVhhIBM2DVcB6KCoMJ4PmwAoAFmQ3a9GtszhaDJi96GQ5X3SLLYR2AmYtOksuF91yC6CNgPmql/WHMyaEzqWBgHlaSu40kULDaBxghhacPjkh1E/LAHOzgvg5EkI9NAswLytJn5wM6qBRgDlZV/wcyaA2mgSYjTXGz5EQaqQ9gJlYb/7syaAGWgOYh1Xnz54IqtEYwCysPH6ORFCZtgDmYAv5syeCUpoCiLeJA6Ajh0EJDQGE21AAiaCUdqDNvcOG8sl/iJ6PsnHnaoplfvJ3b3z+5kPuvPqvfjzmxyzKKYDu7v/1SlPd/vjPb175xN+MnA7//WuNn//e6bP3f/HMX/ZNZ78cz7e9KIIKmoE2EmgMj99ObtTyogw6SA6APnr55t+f+qChbB9eacyma5okgZwMKmgF2kigEfy/n9+VfHW0T1qO0gjc4Uhn93pD0d6HQH/lv62JEkgEZTQCbSTQ9d3fVYmgykVw+1rfVLsP0dRV069jqgQSQUfagDYS6Op+WzkC2ntzpM9ajHIAHQfbGk73PGg7Nrqu/gQapjeBRNCBJqCNBDr5p3/xzhWm/uR79QDaffpXV5jyglUCqHUYbl/SG88PXdd0CSSC9rQAyxKUQN/c3fn2raMiPwR68Uc3P/ziT48/3PmPt53solUDqG0YriUZrm7CBBJBTyUQSxOWQDe+8svbTf1+6ajnOM2tnwmqBdAxA2rV/uwIuNCUCfRQAkmgmcu+O/Lqt3509lsfv33zzjtfrl3v2zPJx280vmvQ3Dz5xc2bdy/U3/34+6/ua+0LX3qnehxx9gLWM+KyNmp+V3cC7XZfPH89JB/5vfJBz6PjNF+6xSQXr34I1PyVoPp1CO+/9Xs3v3r2979bL+y/fmu3e+ZzPzsjL07OT6BfH2bkM18vjxDmCfT+W/vzWp/9buP4oYMgCTRrT949nS54MT8RkZ9MKMrk/WQ0J3vxzeSt5TxonGT+vptJ/HxX7JhXq3HLW7O/u3nDz4vz7F8uBU0+3nT4iD9KX2qbZIdBc3W/WIjC/VL1716W9gS6SdHvVEP09FpZbSrZHxYnfrJBuS0n0H75a3X5cM1B5ZTP4fqE5OTQT18r2vmZb5Tq/if+5vC3N7//51/bVcJkwIHJudfC/fq1ZD72n/x8+lGlV0VQg40v/rw9Tir3ja9klSs/m/BS+cdDuc0TKD3nnZ5oaJlkkUCP8lferFXjtrcWCZQU9VL1rVyB/OI/9M5Nl0FzVa31RcNkv+lZlq4EummlymDc4AR68v/92z9749XPv1R544YTaL/4LSFQud55nwBJKP1VqaU/V7ywT6D/9LXs169XL144TLjncrozE+jBy8l8vFJNoI9Kr4qgBtte+nmrVba8VOUxcbiON8+aY23NfvpO6aKrUylsnWSWQP/hm8lbytW49a15At0rvXoq/u9W3nd6qX2SHYbN1fGvkux9lDRZ77I0nGUq3cpg98X0uHJwAlVl+w4bPg+0a74XXH0YrpILdytN/Xzywif+WfbLT32wP5hKk2z/c9/JnPMSqBQxu93vlxPo2fKrjdm3k0DMU8Olu3mtupeUuKy0ZsW28Xrfoua3T7L6ylefVqpx79y8Wnk1/5pL/TuYeb3vmGSHYXP1qDq99G29y9KYHU/+Nv0yTzJ0d3EClVJxk3bNCVS/M085Ow5Xy+3+4Ac3//zNXx+qfB4Zx2jaj3j99LVXjiePkjQZcnXAWQl0+OPdc/uTPO+/la32JIF22emoj39ymMnGa8l3Eoh5un8qdU/+8/Hf+T59XvNeql1QdSqu+yt+H+c/vdk3yUpRPvy2VI075qY4+Dm8mB/yVMYI92ejfvdu6Y0dk+wwbK7Kg26VAa/eZWnLjvSE1unq7EsTKJvHuX3hakL7xW9KgUNylIbhStlxCKji5dIfH+r+8YePf1YduxsyCHdeAh0vHM8+4Bgz5QR67ofHn379teoS5XYSiFkqn8qo7NOfxuGy8l87ynnp+IvSyx2TrCTQ4U/Satw1N/fSN1U+Mv8p+9Os9r/Uu4DtBs5V+cKD7LXjj/3L0h4K5cG440UelybQ4EVerV3bAxkOlT2JgfLJ/31xT6p5epncoe6fDnP2WXX6yyGDcNlhTaNaAh3+9vRphysoSnNy+rTWOzrsfx+9HiJteuFnrVJC75eLWl7ms0vXij+rXihXugC4Y5JFAt3ZX+r1u1+dPuP4etfc3Ct/ZHbU05QR+Y+H17oXsNXAuSqfYykdEvUvS9dclAfjbjGCVmqnbdq1JVD1zjylY5naocz+r7M/bgqnIskGfUXnnASqRlqaf5VzWe1fqN1JIOboWAuLU+RZvcorXmW/u9iPzpOk+EW6o90xySeV8brTnx8rZNfcZAlUvqghe1/1VMe9F774rb//x1/1TbK3XYbOVVbfS3k0+F1tksG4yxPo8TdvO4Xl27UlUOXOPOUgqR3KHF5+vaj76WtpdA0ahDsrgaqRdgjOJIFaszC1k0DMUBYx1WipJsvR6cKz6pce0xMgXZPMEyi9Gi2pxp1zc6+aMkkVbz+y6VvANkPnqpR96WHPgGXpPS4pBuMuzo/Hp1N527VrfSzqYdCqdOKnfB1C+VDmvaK6V6+/TofhBg3CnZNAtUirjgemc9J6ALaTQMxQ9ZAgq5OngpVc/Jycv6+d3U7Gn7ommSdQWlGTatw5N9Urn+tHKU033+xdwBZD5yq99qD0woBl6U2gD9+4ZQLlF2hs+raku/YEKt+Zp1y97+6qRxMPiterdT499Bg0CHfOlQjlYKx8RPXTJFCzTS/8nJWOcU5OFSsZh0uOG6rXgKU1tWuStWOn0ju756YjgeqzM3wBWwydq/TwqxQ6A5alO4Ge/PwLxbvefHrRlQgCaG/XGkDZaZTXT6FwGtA6jtDVHI83anX+NAw3bBDunASqH1Qlh2fnJNCWq/CWl33W+gv0o4bfNRxJnJVApaK58ARKYqeUK7dMoN+9W70S4fwEepRNYtBNINZr15FA6TDc/qP3FgoAAB6tSURBVN+Va6rrTglUio/TccqwQbhzE6j8HZ/TwdjwBNr4QdCWl33WBhTofBwuHQmKS6DTW+tnakIS6NQU5TM/t0qgx28kb8luz3N2AiX33tu0XUcCpcNw++L9evmVuuPf1hLodJXCsEE4CTSxLS/7rLXUyaSqFc/bLN0n+vxRuNaDlf5an57ruVIC9Z6CGTpX+cs3P6VfBjpzWcrK30jNb1F6bgLlV9B/p2dJV2/XkUDJ1cyV0y3nJVARCgMH4STQxLa87LPWe8eW5DukL9V+25VAjZMclkDNc9ORQP2jcGefyh86V0Uy5N/azWfirGVJXOuuPOmtzLdt3wytSXC6M8/pSreD5NrrunoC7fNiHx0DB+GcB5rYlpd91nq/HpPeb+30V+dcC5fqSaDOuelIoI6KPvT7Px2f1jeJ7Is/RRIN+OCBdyb98uV3Js0C8MW+Y70t6Eqg4mY7+3+k97Np/2bNw6YEysfwBg7CnXstXPlOO66FO9umF37O+u7c/9vPN9a62lXV9e8DNU6yJ4E656YrgSr3H7iZ7Vf/7N/9118OWMA2Q+fqad5Gn/wv5fk7a1lypacz3OZx3VkAbfsiuFxXAhV35tnX+dJYV1eWNCTQcWRs6CDcOQnU+30gCdRv0ws/Zx3jV8nLuZeqvy9+kdwToWuSPQnUOTddCVS999np554FbDV0rorX73yhsYmGLUsufULdmTNcku04CKCDzgTKh+HK1yE8bBz8evZPfnCq85X4OA7D1b+70+KMBKo9t7V6TwQJ1G/TCz9r2Shb8ZCDO1/81r//0S8rr2b796cSnydQ8SzQ+n3hGifZk0Cdc9OVQOmN4E6vHabTvYCths5V8nrpj85bltxVntLtdtgVnQmUDcPl53EqdT7NiOQGoQ0JdIyw+o0UWpxzb+zqo7uPt8o+faoE6rfphZ+1bG85i5KsAubBko/BFffGriRO5WuZ2U8dk+xLoK656Uqg4qZxpRtRHz6mcwHbDZ2r5DeVJTtnWUpTqj4c9XwND0va8o15dp0RdPhK0CsP6oFQueVa+u3VpgTaT+EPuy5fKDkngSr3xj4+ru7MBNpJIOYpz5b9WYdflL9Dn4y15UW28srxYq3q7cvaJ9mXQF1z05lA+fztH1eUP4ynfD+5pkn2Nkv/XKWv78rfvTljWXI3i1FcfX250sm7wnYTqPsg6FDfP/UX9efqHJ8P9I3sp48PT+zOzhQ1JdB+Os8OHYS74PlA+TPCP6o9H2hgAkWvhVDbXvpZa7jEKt+9Ty/ozb/e0vicn125KLdPsjeBOuamM4GadvuzT+mYZIehc3XwqPG3ZyxL8ZZ/8U71Vxe4V/vcPQnUIn8Wd61wH5+Reng06cc/+b39v9Mn1NXi427LZBpd8xmpEqjftpd+3up7zG+WXygfSqSjaS+U3noqqK2T7E2g9rf2JNDTd6vvK15pn2SHoXN1UGRNucoPX5ararlse9sJ1DcMt2sYPavfGi59SnctPh60TKbRWQmUH/ekzkug/Tui10KobS/9zFUKZf4l+vw4p3Iokd5d59MfJsXukz/uneSABGp9a18CVY+CXjzNTuskOwydq/JnV6Jt8LJcVcvdGDacQN0RlN39oGH07DjydlJ6CF0tPo6BNWwQ7swEevggjaDn/vXuzATavyt6HcTa+OLP3JO3myp3taqm43BFkpze+uVf9U9ySAK1vbU3gZ7+Nr2ZTelunG2T7DB0rrKPbnkQ6dBluarG6xC2nUADvhLU/O3TnySV/7nTeaLGBCpdo9bnzAR6+OvXivn43Adn3xNht/UAkkAz9+Tv3th/neXOq98uLsPK999rjwraj8MlSfLh2zd/eOfLtbLeMMlhCdTy1v4Eevr08fdf3f/uhS+9Uz2b3zzJDkPnqrRcDfcAHbYs19R8jk4C9dyZp3odQu79tz6zf/Nn/uSHlTpfj4/DdIYNwp2dQPsZ2Z+KevbrPzv/rjw7CbT15V+bS7/nCTE6I+hKht4T7taaA7CVAJJAayOBWJgJEujMYLjdBw081toTQBJobSQQCzP+QVB12Oxq3vtMfjegiz7IIdBTCbQ2EoilGT2C6s/xuZL3KhOu3qWnmwDa0wTrsoIEarlibM8zdVZp5AgafFvssx3vG1T8eM4ldwIoow3WRQKxOMeVO0ZE/GZ/s4LXxjoEyr5o9Nnv/mz/UT89XJY99BBoJ4CONMK6SCAWaKwIei/vOa83/K7BuZcrPKhNYeCxlgDKaYV1kUAs0UgRlCdEOjR2zQSqTuwZAXQuzbAuEohlGiWDstu2lcbgrppA6S0Rdrs/+GH/Gx7KnxINsS4rSCC2aYwI+vitl/fnaUq/u24C3WTQv/nM4dYNn/36wHNNAiilJYA52I1yGDQ3OwFUoimAeVh/BsmfKo0BzMRu3Rm0E0A1WgOYjRVnkPxpoj2AGVlpBsmfZloEmJUVZpD8aaNNgJnZrSmEimu9o1t1lrQKMDvJl3SiE+QW0u8aRbfoTGkXYIbK3xWNzpKzVb7rGt2as6VlgHna1UTnyhC1mY5uxlnTOsCM1WNorknUMJ/RjTd/mgiYu8YYmrfoJlsI7QQsRHSqDBPdSouitYCFic6YFtHNskQajbMtbXNb2vzCZtguOdfyCvry5hi2wWbJeZZZzZc517B2NkrOsthKvtgZhxWzTXKGJR9KLHneYaVskQy38Bq+8NmH9bFBMtTyDyKWvwSwLjZHhllH9V7HUsBa2BgZZC2ley3LAatgW2SANdXtNS0LLJwtkX4rq9krWxxYLhsifdZ30LC+JYJlshnSY5XVepULBYtjK6TTWg8X1rpcsCi2QbqsuE6veNFgKWyCtFv3gcK6lw6WwAZIm/VX6PUvIcybzY8WWyjPW1hGmDFbH422Upu3spwwS7Y9mmyoLm9oUWFubHrUbevAYFtLC3Niw6NmcxV5cwsMM2G7o2KLhwRbXGaYAVsdZRutxRtdbIhloyO13YOB7S45xLHJcbLtKrztpYcINjgKWy/BW19+mJztjYz6qw1gYrY2jtTeA80AE7KxsWfnP6clYDo2NZ7a8y/RGDAVWxp2+yu0B0zEdrZ56m2dNoFJ2Mq2TrFtolVgCjaybVNp22gZGJ9NbNNU2Q4aB8ZmC9swu/ndtA+MzPa1XeprL00Eo7J5bZUd/CG0EozJxrVRKutAGgrGY9vaJLv2w2krGI0ta4PU1PNoLxiJ7Wp7FNRzaTEYh81qa1TTS2g1GIONamNU0gtpOLg+29Sm2JW/nLaDq7NFbYkaeiuaD67MBrUdduJvSwvCddmctkL1vAatCNdkY9oIpfM6tCNckW1pE9TN69GWcDW2pC1QM69Kc8KV2JDWz077tWlRuA6b0eqpliPQqHANtqKVs7s+Du0KV2AbWjd1cjSaFm7NJrRmdtTHpHXhtmxA66VCjk0Lw+3YfFZLeRyfNoZbsfWslNo4De0Mt2DbWSd1cTKaGi5m01kjO+ZT0tpwKRvOCqmIE9PgcBnbzerYJZ+eNoeL2GpWRi2Mod3hAraZdVEIo2h5OJ9NZk1UwUhaH85lg1kRFTCYFQDnsb2shl3weNYBnMXWshZq3yxYDXAGG8s62PmeC2sChrOprIKqNyNWBgxlS1kBu93zYn3AQLaTxVPv5sc6gUFsJUun2M2RtQJD2EiWTaWbK2sG+tlEFk2VmzErB/rYQhbMbva8WT/Qw/axXOrb7FlF0MnmsVR2sJfAWoIuNo5lUtmWwpqCdjaNRVLWlsO6gla2jAVS05bF+oIWtovlUc8WxyqDRjaLpbFDvUTWGjSxUSyMSrZQVhzU2SYWxa70cll3UGOLWBI1bNGsPqiwQSyHneilswahzOawFKrXGliLkLIxLITStQ7WIyRsC4ugbq2HdQkFW8ISqFmrYnVCxoYwf3aa18YahSObweypVitkpcKerWDm7C6vk/UKTyXQ3KlTq2XVggSaNTvKa2btgg1gvlSotbOG2Trdf7aUp/Wzjtk4vX+m1KZtsJ7ZNH1/ntSlzbCq2TBdf47sGG+Jtc126fgzpCJtjBXOVun3s2OXeHusczZKr58ZtWibrHc2SZ+fF4Voq6x5tkiXnxNVaMusfbZHh58RFWjjdAC2Rn+fDbvA6ANsjN4+F2oPT3UDNkZnnwc7vxzpCWyJrj4Lqg4FnYHt0NNnwG4vKf2BzdDPw6k3VOkTbIReHk2xoU6vYBt08lgqDc30DLZAFw+lytBK52D99PBAdnPpon+wevp3HPWFHroIK6d7R7GDSz+9hHXTuWOoLAyjp7BmunYIZYWh9BVWTM8OoKZwDv2F1dKvp6eecCZdhpXSrW9lN5XoBeVik/WR4aKbBHI64wVUD/qE9pHhopuJrdMFzxVdM3bKxuxFd5CzRDcWm6b/naO++T6chrqxGGF9ZDi9idnQ9wabR01RNuZsHn1kOL2JYDreMHMqLcrGPM2pjwynNxFJrxtifpVF1Zib+fWR4fQmouhz/WZaWlSNGZlpHxlObyKEHtdj1qVF2ZiFWfeR4fQmpqe7dZp9bVE1ws2+jwynNzE1na3DImqLqhFqEX1kOL2JaelqrZZTWlSNKMvpI8PpTUxIR2uzqNqiaIRYVB8ZTm9iMvpZi6XVFkVjekvrI8PpTUxEN2u0xNqiaExriX1kOL2JSehlTZZZXOy4TmmZfWQ4vYkp6GMNllpcFI3pFH3k7v4frzStjo///OaVT/zNyCv9vZsPeeYvR5iw3sQEdLG6hebPQ0VjOqc+8tHLN//81AcNa+PwSmM2XdNoCaQ3MQE9rGqpB0BHisYU0j5yONLZvd6wLsbLhqk+RW9ibDpY1aIDSNGYRKmP7CNg93x9TRyiqeH3VzZqzulNjEz/qlh4ACkaEyj3kcNgW8Ppngdtx0bXNe6Rlt7EuHSvsloAPTh+Pa8+0H8oPJeM898t3vXfvzZG8VA0RlbpI23DcHfbzg9d18hjfXoTo9K7ympHQFkC1bfx93YzTaCHasa4qn2keRhuv3LHvw5h/LNNIogx6Vwl9TG4LIFqteS45zvXBLJax1PrI4esqQ3DvTfFpdgPJ7jeQW9iRDpXquEkUJ5A1fGUbBBujgkkgsbU0EeavhJUvw7h/bd+7+ZXz/7+d+sr7Ndv3XSEz/3sopXdl0Af//SP91312T/4wUWT15sYlb6VargKYZ9Az75c35292fCf+b1bJtBo1IzxtPSR6i7KYQ8lOTn009d2uWe+cfr13X3PyvZmnvnntYG7u/2X03Un0Md/9XLxuc9mn3sIx9PHvFea+Qf1TxRBjEfXSjTs3h62yE/9Re1U834rfu61GSeQFTuOpj5yGIarpEC5rj/8q13qc8UL+wT6T1/Lfv169eKFw4R7LqfrTKBfv9b0ue+lKVO5c0NT79SdGI2eddJUXI4J9JPajuF+t/UP/7xha+03QQKpGWNp7CMNw3CV6xDu7sqeT174xD/LfvmpDx5U0uTBrv9cUlcCffRy5XOP8VaabDk+mweGdSfGomOdNBaXQwL931+rVoL9INz/Nt8EMg43ksY+0nBnnnJ2HK+bPJyJ+c1fv5zG1TGa9uNyP33tler42JBBuK4EOoRLdoLp/T/eFZ9bipkHu3R+9kvScAm57sRIdKxC8+7tIYH+nz+vjobczX9bypL2k83HVz773YcXJ9Bv/no/hc98PSsPv84m2PKFEyVjHM0JdEiOUgyUsuMQUMXLpT8+JNDxh49/Vh27GzII15VAh+B77ofZTz85zUXaBd8rHZS919w5dSdGomMVmovLIYE+KI2cZ7Xhleoea8vJ5ofpcPxNPWi8Fq66t3v6ef9HN0Xpv+UDKoc3fFycV3iuufgYNxlF815KVsaTrlAezLpbDon0Mrm7u9KK32fV6S+HDMJ1JNAhwJIDmgfFZyXdOftWQfZn+5+aJqY7MRL9qtCVQB9VroZ7sN9QKwnUdrI52/ssAuSiBLqbTqH4MtJpB7qpZkQ36Aq1BFDtzjylY5naoUzSm5rCqehSQwbhOhKo9krxi2QG9jP3meKnlkE43Ymx6Fe5lt3bYwJV9w3v5r8sl4vU86VJnBxOPJ+bQP97OoVSHrXd+UXJGENbAlXuzFPuGbVDmY9Pg7p3K6+l0TVoEK4jgWoBVlwekXTn/b7U//py/jkN12If6U6MQ7/KtRSXYwJVxscPg3DlOtN+svl4Pvi5/Rmb99/KUuPMBDpM4GbSHx+Ppm72WXf708u/+av2gyAlYwxtCZSMcO2VD5mrQ7jp2Zbq9dfpMNygQbj2BGoIsKJX3U0j8HA5+Gl2GjNPd2Ic+lWuO4HKwxMPDhtqmkB9J5vzQbnS/UzPSaBXSu/Pp/2TNOqUjNG1JlD5zjzltVm/9OR0qFFd72mfGjQI155A5XNKxd8eenExA/vPe37/Sc/nS9GceboT49CvMrvuBCoPwx2+yV6rFs0nmw+1qXyu+YIESr9Akuyntj+DZqdmXF9bJ8nXS7ZaytchpCftEsf9mVrKnIbhhg3CdSdQJU6Ko6rs8pbjH72yn8bh162DcLoTI9GtMm21JUug0o7sfvt9vrS/2nGyuXqHyuSyqeEJdJpA5Q4wla/el2tGdJuuTnsAlYbhHuxq11R3JVDp+OjUcYYNwrUnUMP7i18VO1SHS2qy/x6n1ZJ5uhOj0K0yfQmUVpWPjmdukwTqONlczZbk2/LDE6hyDNV2Uy8lY1wdCZTuKJTPprQk0PFvawnU3nFaXJRARdYcj+aTDt2WeboTo9CtMn0JlI6sZIMWSQK1n2yuHR1VjpyGJdCpSFVG98u720rGuDoSKLkzT2X467wEKkbCBg7CXZhA+3l8Pj8NlP+v9Vps3Ymx6FaZXUt1KUr8acc2217TLGkvJPXzwacLoYYn0KkSDU2gnZJxfW2dJF8xx1VRua9Acu11Xb3j5FcDDByEO/s80OkkU3aBzSsPs28XHF5uuVuH7sQ4dKtcXwKdztJmYxZJAnWcbK7XgdN2fmkCJdPrSqDoFl2hjgQqrn+sfneseuuMkoZbNGW7OgMH4S67Fi4ffnvvdALo5qe77fc41Z0Yh36V60ug05Wq+WZ7Ki0dJ5vru7ISaLk6Eqi4xKQ2ltWVJQ0JdNzVGToId873gdIrJ49fKLibdaabeX7mL4sL5BroToxDv8q1jLCcSnw+DLffjg+/6U2g5sEUCbRcXQmUD8PVvtVZ7wLvPfsnPzit50oCHXd1GsbQmp1xT4T0KpbD+FuROYevWHcMwulOjES/KvQlUH61QT52LoG2pyuBsmG4+rc6K5cvli5vbLpR+iHC6te2tLjkvnDZ3H7q/0q78vMd12LrToxEvyr0JVBe+/NBuOp5oOaNN+g80E7JGEPLgfJpXTQeSVS+rpx+e7UpgfZT+MOuyxdKeu6NXekrp86yP/Xzr4sPee/4yN+Oa7F1J8agXxWaq8upxGcxUwzCVROoeQBjjGvhhiVQdHuuUlcCHZ+G8Bf1QDjesil/YMfxyRqnCwJqHWc/nWeHDsKd+XygU096cDpOP/3UdtSlOzESHeukJ4Gy6DjelfRQSspXYzdvvfXzwUOezlC9q48EmomuBCpuj17rCcfb1h5uTvvxT/ZPFizdNLC263K3Ow7q0x7+jNTnKy/mnSe736BBOCamY500HgQ9KG+ln/rgQbHBd90T4XSy+aJ7IiRnCi5KIKMmI+kfhmuq4/Wr9dOndNcS6EHLZBq9V510Mv2PXq78Ou0qd9MZOc6hQTimpmMlmspLUuKPX/S4W2yoSQJ1nGx+UNlHfXDa7pMEqtxdJ7nL2CUJpGKMpiuCssOKhjr+cfnxheWnStUSqDsOKroSKHk470H62MTsjXnPqjystUR3YjR6Vqo7gQ7b7P/0tfIDjpvvjZ2cbK48K/lYpWoJ9CCtBqX7YV+YQNFNuVodCZQV9cYTgulzcpMnqzcm0HE6wwbhuhOoFH3Pfbf0xvJzXavPGU8IIMaja6UadnDTEl8ZZak/H6jxZPN7yb8f/vprSYFIEqhUDrI914sTSMkYUUMfKRf1ttsKvP/W/smCu8/8yQ+TXzYn0GE6wwbhehLopi/+H3+8n9qzf/CDyhsP3bfoOw13UMjoTYxI3yqpl5e0xFdGWRqekdp0svk4pvLM1w8PNX05LRBJAmV/9I0Psj969uXLE0jJGFVXBF3J0HvCTUJvYkQ6V1lnAmUnb/PhkVICtZ9srnxf9Zl/05RAxfHV0Sf+z9slUHQzrtr4CdR8ZBTD/gxj0rnKaju4pRJ/jInSA0qLQtF6srlyPvj1xm+kPnyYvv2Zv/zo8gRSMkY2+kFQ+RmrsfQmRqV3VVTLS6nEl79lXv0easvJ5odpOj3zjeZ7Itz4b8Xbn/vhw8sTSMkY3dgR1P7Qp8npTYxL96qqlJdyNSh9bad+J4TGk80Hv/nr/SvPfv2DlrvyHKb30/0542cOp4wvTiAlYwLjRtDg22KPT29iZPpX1W70QZYxKRlTGK2P/OZmZ+L912ZzCKQ3MTYdrG65EbRTMiYyUh8pLq1+veF3DUa9XEFvYnx6WIOlHgYpGdMZp4/kV0Sm30aNSiC9iQnoYk2WGUFKxpRG6SPZjdw+Vb97zuQJpDcxBX2s0RIjSMWY1hh95OO3biLos+W758QkkN7EJPSyFkvLILus01taHxlOb2Iiulmb3ZLqy07JiLCoPjKc3sRk9LNWu8XUl52SEWQ5fWQ4vYkJ6WgddksoMDsVI9Ii+shwehPT0tU6zb6+qBjhZt9HhtObmJrO1mPW9UXFmIVZ95Hh9Camp7v1m2mBUTBmZKZ9ZDi9iRB63BC72RWYnYoxM/PrI8PpTUTR54bZ7eZTYZJ5iW4WEnPqI8PpTUTS6wabR4FRMOZsHn1kOL2JYDreOXY101cKBWPewvrIcHoTs6Hvnau++U4uugnoEd1BzhLdWGya/ncBBYM+oX1kuOhmYut0wVtRKegzWR8ZLrpJIKczAhBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARDjfwBUcwVdfO/HKAAAAABJRU5ErkJggg==" />

<!-- rnb-plot-end -->

<!-- rnb-source-begin eyJkYXRhIjoiYGBgclxuYGBgclxuTkFcbk5BXG5OQVxuTkFcbmBgYFxuYGBgIn0= -->

```r
```r
NA
NA
NA
NA

<!-- rnb-source-end -->

<!-- rnb-chunk-end -->


<!-- rnb-text-begin -->



<!-- rnb-text-end -->


<!-- rnb-chunk-begin -->


<!-- rnb-source-begin eyJkYXRhIjoiYGBgclxuYGBgclxuXG5jYXJldDo6Y29uZnVzaW9uTWF0cml4KGdpbmlJbmRleDEwJHByZWQkb2JzLGdpbmlJbmRleDEwJHByZWQkcHJlZClcbmBgYFxuYGBgIn0= -->

```r
```r

caret::confusionMatrix(giniIndex10$pred$obs,giniIndex10$pred$pred)

<!-- rnb-source-end -->

<!-- rnb-output-begin eyJkYXRhIjoiQ29uZnVzaW9uIE1hdHJpeCBhbmQgU3RhdGlzdGljc1xuXG4gICAgICAgICAgIFJlZmVyZW5jZVxuUHJlZGljdGlvbiAgSGlnaCBMb3cgTWVkaXVtIFZlcnlfSGlnaCBWZXJ5X0xvd1xuICBIaWdoICAgICAgICAgMCAgMjkgICAgMTIyICAgICAgICAxOCAgICAgICAgNVxuICBMb3cgICAgICAgICAgMCAxMzMgICAgIDY3ICAgICAgICAgMSAgICAgICA2OVxuICBNZWRpdW0gICAgICAgMCAgNzEgICAgMTM0ICAgICAgICAgOSAgICAgICAyM1xuICBWZXJ5X0hpZ2ggICAgMCAgIDUgICAgIDk4ICAgICAgIDE0NiAgICAgICAgM1xuICBWZXJ5X0xvdyAgICAgMCAgOTMgICAgIDUyICAgICAgICAgNiAgICAgIDExM1xuXG5PdmVyYWxsIFN0YXRpc3RpY3NcbiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIFxuICAgICAgICAgICAgICAgQWNjdXJhY3kgOiAwLjQzOTQgICAgICAgICAgXG4gICAgICAgICAgICAgICAgIDk1JSBDSSA6ICgwLjQxMTEsIDAuNDY4MSlcbiAgICBObyBJbmZvcm1hdGlvbiBSYXRlIDogMC4zOTUyICAgICAgICAgIFxuICAgIFAtVmFsdWUgW0FjYyA+IE5JUl0gOiAwLjAwMTAwNyAgICAgICAgXG4gICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICBcbiAgICAgICAgICAgICAgICAgIEthcHBhIDogMC4yODkxICAgICAgICAgIFxuICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgXG4gTWNuZW1hcidzIFRlc3QgUC1WYWx1ZSA6IDwgMi4yZS0xNiAgICAgICBcblxuU3RhdGlzdGljcyBieSBDbGFzczpcblxuICAgICAgICAgICAgICAgICAgICAgQ2xhc3M6IEhpZ2ggQ2xhc3M6IExvdyBDbGFzczogTWVkaXVtIENsYXNzOiBWZXJ5X0hpZ2hcblNlbnNpdGl2aXR5ICAgICAgICAgICAgICAgICAgIE5BICAgICAwLjQwMTggICAgICAgIDAuMjgzMyAgICAgICAgICAgMC44MTExXG5TcGVjaWZpY2l0eSAgICAgICAgICAgICAgIDAuODU0NiAgICAgMC44NDE4ICAgICAgICAwLjg1NzcgICAgICAgICAgIDAuODk1OFxuUG9zIFByZWQgVmFsdWUgICAgICAgICAgICAgICAgTkEgICAgIDAuNDkyNiAgICAgICAgMC41NjU0ICAgICAgICAgICAwLjU3OTRcbk5lZyBQcmVkIFZhbHVlICAgICAgICAgICAgICAgIE5BICAgICAwLjc4NjQgICAgICAgIDAuNjQ2OSAgICAgICAgICAgMC45NjQwXG5QcmV2YWxlbmNlICAgICAgICAgICAgICAgIDAuMDAwMCAgICAgMC4yNzY1ICAgICAgICAwLjM5NTIgICAgICAgICAgIDAuMTUwNFxuRGV0ZWN0aW9uIFJhdGUgICAgICAgICAgICAwLjAwMDAgICAgIDAuMTExMSAgICAgICAgMC4xMTE5ICAgICAgICAgICAwLjEyMjBcbkRldGVjdGlvbiBQcmV2YWxlbmNlICAgICAgMC4xNDU0ICAgICAwLjIyNTYgICAgICAgIDAuMTk4MCAgICAgICAgICAgMC4yMTA1XG5CYWxhbmNlZCBBY2N1cmFjeSAgICAgICAgICAgICBOQSAgICAgMC42MjE4ICAgICAgICAwLjU3MDUgICAgICAgICAgIDAuODUzNFxuICAgICAgICAgICAgICAgICAgICAgQ2xhc3M6IFZlcnlfTG93XG5TZW5zaXRpdml0eSAgICAgICAgICAgICAgICAgICAwLjUzMDVcblNwZWNpZmljaXR5ICAgICAgICAgICAgICAgICAgIDAuODQ2NVxuUG9zIFByZWQgVmFsdWUgICAgICAgICAgICAgICAgMC40MjgwXG5OZWcgUHJlZCBWYWx1ZSAgICAgICAgICAgICAgICAwLjg5MjhcblByZXZhbGVuY2UgICAgICAgICAgICAgICAgICAgIDAuMTc3OVxuRGV0ZWN0aW9uIFJhdGUgICAgICAgICAgICAgICAgMC4wOTQ0XG5EZXRlY3Rpb24gUHJldmFsZW5jZSAgICAgICAgICAwLjIyMDZcbkJhbGFuY2VkIEFjY3VyYWN5ICAgICAgICAgICAgIDAuNjg4NVxuIn0= -->

Confusion Matrix and Statistics

       Reference

Prediction High Low Medium Very_High Very_Low High 0 29 122 18 5 Low 0 133 67 1 69 Medium 0 71 134 9 23 Very_High 0 5 98 146 3 Very_Low 0 93 52 6 113

Overall Statistics

           Accuracy : 0.4394          
             95% CI : (0.4111, 0.4681)
No Information Rate : 0.3952          
P-Value [Acc > NIR] : 0.001007        
                                      
              Kappa : 0.2891          
                                      

Mcnemar’s Test P-Value : < 2.2e-16

Statistics by Class:

                 Class: High Class: Low Class: Medium Class: Very_High

Sensitivity NA 0.4018 0.2833 0.8111 Specificity 0.8546 0.8418 0.8577 0.8958 Pos Pred Value NA 0.4926 0.5654 0.5794 Neg Pred Value NA 0.7864 0.6469 0.9640 Prevalence 0.0000 0.2765 0.3952 0.1504 Detection Rate 0.0000 0.1111 0.1119 0.1220 Detection Prevalence 0.1454 0.2256 0.1980 0.2105 Balanced Accuracy NA 0.6218 0.5705 0.8534 Class: Very_Low Sensitivity 0.5305 Specificity 0.8465 Pos Pred Value 0.4280 Neg Pred Value 0.8928 Prevalence 0.1779 Detection Rate 0.0944 Detection Prevalence 0.2206 Balanced Accuracy 0.6885




<!-- rnb-output-end -->

<!-- rnb-chunk-end -->


<!-- rnb-text-begin -->


##### 3 Folds


<!-- rnb-text-end -->


<!-- rnb-chunk-begin -->


<!-- rnb-source-begin eyJkYXRhIjoiYGBgclxuYGBgclxuc2V0LnNlZWQoMTApXG5jdHJsIDwtIHRyYWluQ29udHJvbChtZXRob2QgPSBcXGN2XFwsIG51bWJlciA9IDUsIHJldHVyblJlc2FtcD1cXGFsbFxcLCBzYXZlUHJlZGljdGlvbnM9XFxmaW5hbFxcKVxuXG5naW5pSW5kZXg1IDwtIHRyYWluKHNhbGFyeV9pbl91c2QgfiAuLCBkYXRhID0gYmFsYW5jZWRfZGF0YXNldCwgbWV0aG9kID0gXFxycGFydFxcLHRyQ29udHJvbCA9IGN0cmwpXG5cbnBycChnaW5pSW5kZXg1JGZpbmFsTW9kZWwsIGJveC5wYWxldHRlID0gXFxSZWRzXFwsIHR3ZWFrID0gMS4yLCB2YXJsZW4gPSAyMClcbmBgYFxuYGBgIn0= -->

```r
```r
set.seed(10)
ctrl <- trainControl(method = \cv\, number = 5, returnResamp=\all\, savePredictions=\final\)

giniIndex5 <- train(salary_in_usd ~ ., data = balanced_dataset, method = \rpart\,trControl = ctrl)

prp(giniIndex5$finalModel, box.palette = \Reds\, tweak = 1.2, varlen = 20)

<!-- rnb-source-end -->

<!-- rnb-plot-begin -->

<img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAABoEAAAQFCAMAAABJpZH8AAABfVBMVEUAAAAAABEAAB0AADoAAGYAGBEAGB0AGCoAKzUAOjoAOmYAOpAAZrY5AAA5GAA5GBE5GB05Kx05Kyo5KzU5PCo5PDU5PD86AAA6OgA6Ojo6OmY6ZmY6ZpA6ZrY6kLY6kNtkAABkGABkGBFkKwBkPCpkPDVkSzVkSz9kS0pmAABmOgBmOjpmOmZmZgBmZjpmZmZmZpBmkJBmkLZmkNtmtttmtv+NGACNKwCNKxGNPBGNPB2NSyqNSz+NWz+NW0qQOgCQZjqQZmaQkGaQkLaQtraQttuQ27aQ2/+zKwCzKxGzPBGzPB2zPCqzWyqzWz+zW0qzajWzakq2ZgC2Zjq2kDq2kGa2kJC2tma2tra2ttu229u22/+2///XPBHXPB3XSx3XSyrXWyrXWzXXW0rXajXXaj/XakrbkDrbkGbbtmbbtpDbtrbb25Db27bb29vb2//b///7Sx37Wyr7WzX7Wz/7ajX7aj/7akr/tmb/25D/27b/29v//7b//9v////HWNPrAAAACXBIWXMAACToAAAk6AGCYwUcAAAgAElEQVR4nO3d/aPkVnkn+DKNjW3sQMLGYF4CzoTBxLMThsWJZ5YNHmY809nswqQ9yWQgHliz617YdaAN426adP/tc6tKUh29q+qW9Ojl8/nB7nvrlko6Onq+0pFK2j0FgAi76BkAYKMkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQADEkEAAxJBAAMSQQjOrebu+T/xA9H2XjztW1pv7kb9/4/M2EXv1XP77GXDFDEghGJYEu9eTdXeHLMmidJBCMSgJd6Lef3yXu/McrzRuzIoFgVBLoMv/0zd1OBK2eBIJRSaDbTCMxtybkGiQQjEoC9Xn89pv1X5bH4A6+et2ZZA4kEGzQfBLowz/d7RoSKDsE+vSPnz79xZ9m//7VteeTcBIINmgmCfTk54dDnXoCZWeBXkp/mNtxJFcggdiAJ393/F7Jt3509lsfv33zzjv1i4F7Jvn4jcZ3DZqbJ7+4efPuhfq7H3//1X0lfuFL71SPBs5ewHpGXNZGze8amEC/ezcbaqsn0HEQLr/64J5rEdZKArF66fdKXnwn/+X3Kue37yeXXGUvvpm8tZwHjZPM33cziZ+fTl1Uq3HLW7O/u3nDz4szIF8uBU0+GHX4iD9KX2qbZIdBc3W/WIjC/VIWdC9LTwJ9+Ebx3oZRuH/8++//yy/k427zPJfGFUgg1u5xUrlvfCWravmp7pfKPx7KbZ5AeUylVbdjkkUCPUpKa6V6tr21SKCkqJdq7v3S+3Yv/kPv3HQZNFfZ6Fdy/iVrkOw3PcvSmRg//0LyzoYESpU/lTWRQKxc9XsleeQ8fZSWvzxrjlUu++k730vfdyqorZPMEug/fDN5S7kat741T6DyRcinovtu5X2nl9on2WHYXNVGvx4lTda7LO0J9OTd9Eq33sO2n6efyqpIINbtyfd2Vfmo0r1TTOSlNSu2DW9Kan77JKuvfPVppRr3zs2rlVfzqnu/9r683ndMssOwuXpUnV76tt5laUugx2+n7+m9386H2Z87BFojCcT06tX0Uv2flZXu/X72k/98/He+T5/vw790+mctSV780U3FzH96s2+SlaJ8+G2pGnfMTXHwc3gxP+SpjBHuy/Xv3i29sWOSHYbNVXX4q3SFWu+yNCdQejprd+fbPbFyatFBAbS71IBpMwYtz8QuLhIXlY7yqYzKPv1pHC7/9kn1KOel4y9KL3dMspJAhz9Jq3HX3NxL31T5yPyn7E+z2v9S7wK2GzhX5QsPsteOP/YvS0MCPfnbs4bfkpG+IWe3drfSP32uT7MzqdsVifMrR6WEHn8sSmNe5rMTDcWfVS+Uy34+vt4xySKB7nznpmD+7lenzzi+3jU398ofmR31NGVE6fsx3QvYauBcZXNRPjQ8zkb/stTnonTmaNDtrvODvwEB1NtReg2YH65MozOlK2/q/VM71sLiFHlW0PKzK5Vz6cWhQ54kxS/SXfyOST6pjNed/vxYjbvmJkug8kUN2fseVaZ674Uvfuvv//FXfZPsbZehc5VlSSmPBr8rcWrvQ0QP8NuWS9NrbtezRFAUbc6Urr6Z9xSOrORVo6WaLEenMw2lY55kOi/1TDJPoPRqtKQad87NvWrKJFW8/cimbwHbDJ2rUvalhz0DlqU9gb44+FuvyQrqPrt16wQRQSE0ORMaYyPvnGb1kCCrk6eEuNdY4UoHIMkv9hnVNck8gdJDkKQad85N9crn+lFK07n43gVsMXSu0msPSi8MWJb2BOq9AKHw+B9PJ4+6xhav0bNEUAAtzoRG2cS7Jlo6xjk51fJkHC45bqh/BfJUU7smWTt2Kr2ze246EqjjG5m9C9hi6Fylh1+l0BmwLJ3ngc566Gn2YR25urtGz7rKRDiLFmc642zhXVPtL9CPGn7XcCRxVgKVSu/CEyiJnVKu3DaBBt8/KJndjnG4nQRaJi3OdMbZwrumOqBA5+Nw6chZXAKd3lo/UxOSQKemKJ/5uSiB8nthZ4ZejjDgAoudBFomLc50xtnCu6baUieT2lhcbFW6T/T5o3CtByv9tT4913OlBOq9HHvoXOUv3/yUfhnozGVJlb+R+pVf9szoUemrsE12EmiZtDjTGWcL75pq9TLmmuQ7pC/VftuVQI2THJZAzXPTkUD9o3Bn3zJt6FwVxT//1m4+E2ctS1n5rjxfHHJCSAKtlRZnOtMnUO/oTXq/tdNfnXMtXKongTrnpiOBOir60O//dHxa3ySyL/4USTTgg3vvTNp7a4QP//7f/stXP19cHJJ9mgRaHS3OdKZPoL5959+mlbAWOKf6Wv8+UOMkexKoc266Eqhy/4Gb2X71z/7df/3lgAVsM3SunuZt9Mn/Up6/s5alQWkwrpZj1RNxvXcbkkALpcWZzvQJ1PNkmcp93F6q/r6pAnZNsieBOuemK4GqBfj086WPzhk6V8Xrd77Q2ETDlqXR464n1FUufqt+XbdGAi2UFmc60ydQPspWPOTgzhe/9e9/9MvKq9n+/anE5wlUPAu0fl+4xkn2JFDn3HQlUHojuNNrh+l0L2CroXOVvF7JgHOWpUXHU7qzqZe+lORq7BXS4kwnIIGyYbasdmWVPK9kvy0K4L3GxKlWwONPHZPsS6CuuelKoOKmcaUbUR8+pnMB2w2dq+Q3lSU7Z1laPTk+KrWeQPln7p+O8bvKoyqaSKCF0uJMJyCBimzZ3wfmF58v1dFkrC0veJVXjmfJi2u33uybZF8Cdc1NZwKlBTl/GE/5fnJNk+xtlv65Sl/flU/EnLEsXT7808bRtYZH8nXlqgRaKC3OdCISqPYo6VMlS8d28q+3ND7nZ1cuyu2T7E2gjrnpTKCmgpx9SsckOwydq4NHjb89Y1m6PX67IYEapt51u9WdBFomLc50ylv4/VKpvl9UrMdv70dmXi3dvPL4uxe+3HRP5Z66Ubrc7eDN8gvlQ4l0NO2F0ltPBbV1kr0J1P7WngR6+m71fcUr7ZPsMHSuDoo0KA+DDV+WS9Sm/pWuv95VOsH9w8w++dsv7L/1mqyOD48d6duNZ8qqE2F8WpzplLfwR2mN2he5Q/06fV3xznfyF5+cym/D+FJf3aiUsnyy+XFO5VAivbvOpz9MdsQ/+ePeSQ5IoNa39iVQ9SjoxdPstE6yw9C5Kn92JdoGL8tFKlPvfuDErtIJbubgq8W3jooZeVxcAn7nfxkwEcanxZlOeQu/qTCnUZ375RPr5YKXPEKhIYJ668aTt5sqd7WqpuNwRZKc3lp+QFrzJIckUNtbexPo6W/Tb9CUnhnaNskOQ+cq++iWxyMMXZbLJDseuxd7BhYrneBmRdx5p3a7i0dp52pItJ0EmpwWZzrlLbw47Hl6LHFvZv/f7cfaDrewTG7D/Ol9eduftK5XjgF148nfvbEffLnz6mn0JY+62qOC9qmYJMmHb9/84Z36swQaJjksgVre2p9AN7vw3391/7sXvvRONYabJ9lh6FyVlquhaA9blkvdTH2/ml79Vm+qVjrBfu1+4XjXucffLF3zd+hch4Oj+lilBJqeFmc65S18X9aSo5yXsl/lYzn7ylGcokmuUasdBI1QNy79nidRKp3gcED76R/n/z4ca+/3ePLOtQ+jC46muTotznQqW/i9Yq86H5C7n+6aFlcq3DsdqTxq2LOWQFQ7wf1k7bV1rvohmgSanhZnOvUykSXLvdMt106jYjeV4/Sd/T9qDwMJRMPhdXGOMdtrqXWu+jCcBJqeFmc69aGSY43Pj2welb5xUlypcBi/v/PttjiQQDScYizSJtvRuV+7z4MEmgEtznTqp4uLW2MehuPSa96KqwKeFpetfaU5ECRQk4avsFZadVV21cssi3zJe9e90teZJNBMaHGmU9nCb6pAfj+C9DEBieIbk8cLkZuPgyRQk00nUHqWJ8uaf/pm6Vo+CTQTWpzp7Gpf2shrw6EYtCdQ8TiZpi+7SKAmm06ge+UHa+SngZLIedTUDBJoelqc6VS38OPFcMU1b6WvqNYcb5ZwwT0RLiCBlqbUCYph3b37xaNt0+V2LdxMaHGmU93C7+8LxWnntDuBsnuqTPI9Qgm0NKVO0HQaqNy5yhfGNU6ESWhxplPdwh/ty8DpmuzyQEk2kvKoMqQ/Sd1YQQJtTKkTVPtMfreN8uP13BNhDrQ406lu4TdV4dP//2nfdF/3S8/FfvNpZcD+nj1XGpU6QXrX9bz/lDpX8y0R9KQAWpzpVLfwm/3TT/7P5W9u5Hum+9vCHX6/LxYvJQ/mqd+bTN2g1AlKYVOkUdK59n2qaShST5qeFmc61S38ONhVfkTD4Yrrwy1Ik8eOvri/FWd6s9KuqbJBaScojeYWh82HKy0Ptyo9XNLS9LAHPWl6Wpzp1Lbwe5VaUDyMc1e6rVei4eFr6galTpBedJCkUdq5mp82pCdNT4szndoWfn9XOao5xc3pBginx4o1PyVG3aDUCe6XH71XpNGpc7U8bUhPmp4WZzq1LTy/3ODkyS/e2NWeonx8tPKdL70zbKpsT9oJ7jWdBto7dq47jc96r06EaWhxptN0DHT7K57VDa7TCfSk6WlxptNwLVzTiZ1bTpUNkkALpcWZTnULv9/09Z5bT5UNkkALpcWZTmUL77sLz2VTZYsk0EJpcaaTbOFPjpcmNV4Ue/lU2SoJtFBanOmctvBr3nhN3UACLZUWZzrVBHqxfoODW02VzZJAC6XFmc5pC/+n77U98fQ2U2WzJNBCaXGmsxtlE19O3diN0wDj281+ziXQQmlxJjTGJj732ngy+zLebvYRdI25m/kirpMWZ0IjbOPLKRtzL+LdZp5BV5i5WS/famlypnT1rXw5ZWM5c9pi3hl065mb88KtmDZnUvndiec1qfEtZ07bzbu9dxf3h8vfya1pdCa1u7roJRpkOXPaadYtvo2etDZanYndulAssGwsZ077zLrRN9CTVkezE+GWsbOskrGome0197ZfcUdaI40P41pbkVO1uR4dCUa1wnotg7gW3QjGtM5aLYO4Dp0IxrPaQi2CuAp9CEaz5jItg7gCPQjGsvIaLYO4Nf0HxrH++uxiZm5L74FRbKI4iyBuR+eBMWylNMsgbkPXgRFsqCzLIC6n48DVbasmiyAupt/AtW2uIssgLqTXwJVtsRzLIC6iz8B1bbMUuzKbS+gxcE3brcMiiPPpMHBFm67CMohz6S5wPVsvwTKI8+gscC3KrwjiPPoKXIniuyeDOIOeAteh8mZkEIPpJ3AVqm7BldkMpZfAFSi5JSKIYXQSuD0Ft0oGMYQuArem2jaQQfTTQeCWVNpmIohe+gfcjjrbSgbRQ++AW1Fku8ggOukbcBsKbDcRRBddAy6nvPaTQbTTMeBiausgMog2ugVcSmEdyE0SaKFTwGUU1TOIIBrpE3ARJfU8MogGegRcQj09mwyiRn+AC6ilFxBBVOkOcDaV9EIyiDKdAc6ljF5OBpHSFeBMauhtuDKbhI4AZ1E/b0sEUdAP4Byq5xXIIDJ6AZxB6bwOGcSBPgDDKZvXIoLY0wVgKEXzmmQQEggGUzGvTAZh9cMwyuXViaDNs/ZhCLVyFDJo46x7GEChHIsM2jRrHvqpkuNxk4Qts96hlwo5KhG0XVY79FAfRyeDtspKh26K4xRk0DZZ5dBJZZyGCNokaxy6KIuTkUEbZH1DOzVxUjJoc6xtaKUgTsyV2VtjXUMb1XB6ImhbrGpophTGkEFbYkVDI3UwjAzaDqsZmiiCgUTQZljL0EAFjCWDNsI6hhrlL54M2gRrGKrUvjlwZfYWWL9QofDNhAhaP6sXSlS9GZFBa2flQkrJmxcZtG5WLSTUu7kRQatmzcKJYjdDMmjFrFfIqXQzJYNWy1qFjDI3WyJoraxUOFLk5kwGrZNVCnsK3NzJoDWyQuGpAFoCN0lYIasTBNBCiKDVsTbBRdiLIYNWxrpk8xS1JZFBq2JNsnUq2rKIoDWxItk49WxxZNB6WI1smlq2SDJoLaxEtkwhWyhXZq+EVciGqWLLJYJWwRpku5SwRZNBK2D9sVXq1+LJoMWz9tgoxWsFRNDSWXlsk9K1DjJo2aw6tkjZWg8ZtGRWHBukZq2JK7MXzGpjexSslRFBi2WtsTmq1frIoIWyztgYpWqdZNAiWWNsizq1ViJoiawwNkWVWjEZtDxWF1uiQq2bDFoaK4vtUJ5WTwQtjHXFZihOWyCDFsWaYitUpo2QQQtiPbENqtJ2uEnCclhLbIKStCkiaCmsJLZAQdoaGbQMVhEboBhtkAxaAiuI1VOJtkkELYD1w9qpQ5slg2bP2mHlFKEtk0EzZ92wagrQxrkye96sGdZM9UEEzZkVw4qpPTyVQXNmtbBe6g5HMmiurBTWStGhIIJmyjphpZQcUjJolqwR1km9oUIGzZD1wRqpNdSJoPmxOlghlYZGMmhurAzWR5mhjQyaF6uC1VFiaOcmCbNiRbAy6gvdRNCMWA+si+pCLxk0G9YCq6K0MIQMmgnrgBVRVhhIBM2DVcB6KCoMJ4PmwAoAFmQ3a9GtszhaDJi96GQ5X3SLLYR2AmYtOksuF91yC6CNgPmql/WHMyaEzqWBgHlaSu40kULDaBxghhacPjkh1E/LAHOzgvg5EkI9NAswLytJn5wM6qBRgDlZV/wcyaA2mgSYjTXGz5EQaqQ9gJlYb/7syaAGWgOYh1Xnz54IqtEYwCysPH6ORFCZtgDmYAv5syeCUpoCiLeJA6Ajh0EJDQGE21AAiaCUdqDNvcOG8sl/iJ6PsnHnaoplfvJ3b3z+5kPuvPqvfjzmxyzKKYDu7v/1SlPd/vjPb175xN+MnA7//WuNn//e6bP3f/HMX/ZNZ78cz7e9KIIKmoE2EmgMj99ObtTyogw6SA6APnr55t+f+qChbB9eacyma5okgZwMKmgF2kigEfy/n9+VfHW0T1qO0gjc4Uhn93pD0d6HQH/lv62JEkgEZTQCbSTQ9d3fVYmgykVw+1rfVLsP0dRV069jqgQSQUfagDYS6Op+WzkC2ntzpM9ajHIAHQfbGk73PGg7Nrqu/gQapjeBRNCBJqCNBDr5p3/xzhWm/uR79QDaffpXV5jyglUCqHUYbl/SG88PXdd0CSSC9rQAyxKUQN/c3fn2raMiPwR68Uc3P/ziT48/3PmPt53solUDqG0YriUZrm7CBBJBTyUQSxOWQDe+8svbTf1+6ajnOM2tnwmqBdAxA2rV/uwIuNCUCfRQAkmgmcu+O/Lqt3509lsfv33zzjtfrl3v2zPJx280vmvQ3Dz5xc2bdy/U3/34+6/ua+0LX3qnehxx9gLWM+KyNmp+V3cC7XZfPH89JB/5vfJBz6PjNF+6xSQXr34I1PyVoPp1CO+/9Xs3v3r2979bL+y/fmu3e+ZzPzsjL07OT6BfH2bkM18vjxDmCfT+W/vzWp/9buP4oYMgCTRrT949nS54MT8RkZ9MKMrk/WQ0J3vxzeSt5TxonGT+vptJ/HxX7JhXq3HLW7O/u3nDz4vz7F8uBU0+3nT4iD9KX2qbZIdBc3W/WIjC/VL1716W9gS6SdHvVEP09FpZbSrZHxYnfrJBuS0n0H75a3X5cM1B5ZTP4fqE5OTQT18r2vmZb5Tq/if+5vC3N7//51/bVcJkwIHJudfC/fq1ZD72n/x8+lGlV0VQg40v/rw9Tir3ja9klSs/m/BS+cdDuc0TKD3nnZ5oaJlkkUCP8lferFXjtrcWCZQU9VL1rVyB/OI/9M5Nl0FzVa31RcNkv+lZlq4EummlymDc4AR68v/92z9749XPv1R544YTaL/4LSFQud55nwBJKP1VqaU/V7ywT6D/9LXs169XL144TLjncrozE+jBy8l8vFJNoI9Kr4qgBtte+nmrVba8VOUxcbiON8+aY23NfvpO6aKrUylsnWSWQP/hm8lbytW49a15At0rvXoq/u9W3nd6qX2SHYbN1fGvkux9lDRZ77I0nGUq3cpg98X0uHJwAlVl+w4bPg+0a74XXH0YrpILdytN/Xzywif+WfbLT32wP5hKk2z/c9/JnPMSqBQxu93vlxPo2fKrjdm3k0DMU8Olu3mtupeUuKy0ZsW28Xrfoua3T7L6ylefVqpx79y8Wnk1/5pL/TuYeb3vmGSHYXP1qDq99G29y9KYHU/+Nv0yTzJ0d3EClVJxk3bNCVS/M085Ow5Xy+3+4Ac3//zNXx+qfB4Zx2jaj3j99LVXjiePkjQZcnXAWQl0+OPdc/uTPO+/la32JIF22emoj39ymMnGa8l3Eoh5un8qdU/+8/Hf+T59XvNeql1QdSqu+yt+H+c/vdk3yUpRPvy2VI075qY4+Dm8mB/yVMYI92ejfvdu6Y0dk+wwbK7Kg26VAa/eZWnLjvSE1unq7EsTKJvHuX3hakL7xW9KgUNylIbhStlxCKji5dIfH+r+8YePf1YduxsyCHdeAh0vHM8+4Bgz5QR67ofHn379teoS5XYSiFkqn8qo7NOfxuGy8l87ynnp+IvSyx2TrCTQ4U/Satw1N/fSN1U+Mv8p+9Os9r/Uu4DtBs5V+cKD7LXjj/3L0h4K5cG440UelybQ4EVerV3bAxkOlT2JgfLJ/31xT6p5epncoe6fDnP2WXX6yyGDcNlhTaNaAh3+9vRphysoSnNy+rTWOzrsfx+9HiJteuFnrVJC75eLWl7ms0vXij+rXihXugC4Y5JFAt3ZX+r1u1+dPuP4etfc3Ct/ZHbU05QR+Y+H17oXsNXAuSqfYykdEvUvS9dclAfjbjGCVmqnbdq1JVD1zjylY5naocz+r7M/bgqnIskGfUXnnASqRlqaf5VzWe1fqN1JIOboWAuLU+RZvcorXmW/u9iPzpOk+EW6o90xySeV8brTnx8rZNfcZAlUvqghe1/1VMe9F774rb//x1/1TbK3XYbOVVbfS3k0+F1tksG4yxPo8TdvO4Xl27UlUOXOPOUgqR3KHF5+vaj76WtpdA0ahDsrgaqRdgjOJIFaszC1k0DMUBYx1WipJsvR6cKz6pce0xMgXZPMEyi9Gi2pxp1zc6+aMkkVbz+y6VvANkPnqpR96WHPgGXpPS4pBuMuzo/Hp1N527VrfSzqYdCqdOKnfB1C+VDmvaK6V6+/TofhBg3CnZNAtUirjgemc9J6ALaTQMxQ9ZAgq5OngpVc/Jycv6+d3U7Gn7ommSdQWlGTatw5N9Urn+tHKU033+xdwBZD5yq99qD0woBl6U2gD9+4ZQLlF2hs+raku/YEKt+Zp1y97+6qRxMPiterdT499Bg0CHfOlQjlYKx8RPXTJFCzTS/8nJWOcU5OFSsZh0uOG6rXgKU1tWuStWOn0ju756YjgeqzM3wBWwydq/TwqxQ6A5alO4Ge/PwLxbvefHrRlQgCaG/XGkDZaZTXT6FwGtA6jtDVHI83anX+NAw3bBDunASqH1Qlh2fnJNCWq/CWl33W+gv0o4bfNRxJnJVApaK58ARKYqeUK7dMoN+9W70S4fwEepRNYtBNINZr15FA6TDc/qP3FgoAAB6tSURBVN+Va6rrTglUio/TccqwQbhzE6j8HZ/TwdjwBNr4QdCWl33WBhTofBwuHQmKS6DTW+tnakIS6NQU5TM/t0qgx28kb8luz3N2AiX33tu0XUcCpcNw++L9evmVuuPf1hLodJXCsEE4CTSxLS/7rLXUyaSqFc/bLN0n+vxRuNaDlf5an57ruVIC9Z6CGTpX+cs3P6VfBjpzWcrK30jNb1F6bgLlV9B/p2dJV2/XkUDJ1cyV0y3nJVARCgMH4STQxLa87LPWe8eW5DukL9V+25VAjZMclkDNc9ORQP2jcGefyh86V0Uy5N/azWfirGVJXOuuPOmtzLdt3wytSXC6M8/pSreD5NrrunoC7fNiHx0DB+GcB5rYlpd91nq/HpPeb+30V+dcC5fqSaDOuelIoI6KPvT7Px2f1jeJ7Is/RRIN+OCBdyb98uV3Js0C8MW+Y70t6Eqg4mY7+3+k97Np/2bNw6YEysfwBg7CnXstXPlOO66FO9umF37O+u7c/9vPN9a62lXV9e8DNU6yJ4E656YrgSr3H7iZ7Vf/7N/9118OWMA2Q+fqad5Gn/wv5fk7a1lypacz3OZx3VkAbfsiuFxXAhV35tnX+dJYV1eWNCTQcWRs6CDcOQnU+30gCdRv0ws/Zx3jV8nLuZeqvy9+kdwToWuSPQnUOTddCVS999np554FbDV0rorX73yhsYmGLUsufULdmTNcku04CKCDzgTKh+HK1yE8bBz8evZPfnCq85X4OA7D1b+70+KMBKo9t7V6TwQJ1G/TCz9r2Shb8ZCDO1/81r//0S8rr2b796cSnydQ8SzQ+n3hGifZk0Cdc9OVQOmN4E6vHabTvYCths5V8nrpj85bltxVntLtdtgVnQmUDcPl53EqdT7NiOQGoQ0JdIyw+o0UWpxzb+zqo7uPt8o+faoE6rfphZ+1bG85i5KsAubBko/BFffGriRO5WuZ2U8dk+xLoK656Uqg4qZxpRtRHz6mcwHbDZ2r5DeVJTtnWUpTqj4c9XwND0va8o15dp0RdPhK0CsP6oFQueVa+u3VpgTaT+EPuy5fKDkngSr3xj4+ru7MBNpJIOYpz5b9WYdflL9Dn4y15UW28srxYq3q7cvaJ9mXQF1z05lA+fztH1eUP4ynfD+5pkn2Nkv/XKWv78rfvTljWXI3i1FcfX250sm7wnYTqPsg6FDfP/UX9efqHJ8P9I3sp48PT+zOzhQ1JdB+Os8OHYS74PlA+TPCP6o9H2hgAkWvhVDbXvpZa7jEKt+9Ty/ozb/e0vicn125KLdPsjeBOuamM4GadvuzT+mYZIehc3XwqPG3ZyxL8ZZ/8U71Vxe4V/vcPQnUIn8Wd61wH5+Reng06cc/+b39v9Mn1NXi427LZBpd8xmpEqjftpd+3up7zG+WXygfSqSjaS+U3noqqK2T7E2g9rf2JNDTd6vvK15pn2SHoXN1UGRNucoPX5ararlse9sJ1DcMt2sYPavfGi59SnctPh60TKbRWQmUH/ekzkug/Tui10KobS/9zFUKZf4l+vw4p3Iokd5d59MfJsXukz/uneSABGp9a18CVY+CXjzNTuskOwydq/JnV6Jt8LJcVcvdGDacQN0RlN39oGH07DjydlJ6CF0tPo6BNWwQ7swEevggjaDn/vXuzATavyt6HcTa+OLP3JO3myp3taqm43BFkpze+uVf9U9ySAK1vbU3gZ7+Nr2ZTelunG2T7DB0rrKPbnkQ6dBluarG6xC2nUADvhLU/O3TnySV/7nTeaLGBCpdo9bnzAR6+OvXivn43Adn3xNht/UAkkAz9+Tv3th/neXOq98uLsPK999rjwraj8MlSfLh2zd/eOfLtbLeMMlhCdTy1v4Eevr08fdf3f/uhS+9Uz2b3zzJDkPnqrRcDfcAHbYs19R8jk4C9dyZp3odQu79tz6zf/Nn/uSHlTpfj4/DdIYNwp2dQPsZ2Z+KevbrPzv/rjw7CbT15V+bS7/nCTE6I+hKht4T7taaA7CVAJJAayOBWJgJEujMYLjdBw081toTQBJobSQQCzP+QVB12Oxq3vtMfjegiz7IIdBTCbQ2EoilGT2C6s/xuZL3KhOu3qWnmwDa0wTrsoIEarlibM8zdVZp5AgafFvssx3vG1T8eM4ldwIoow3WRQKxOMeVO0ZE/GZ/s4LXxjoEyr5o9Nnv/mz/UT89XJY99BBoJ4CONMK6SCAWaKwIei/vOa83/K7BuZcrPKhNYeCxlgDKaYV1kUAs0UgRlCdEOjR2zQSqTuwZAXQuzbAuEohlGiWDstu2lcbgrppA6S0Rdrs/+GH/Gx7KnxINsS4rSCC2aYwI+vitl/fnaUq/u24C3WTQv/nM4dYNn/36wHNNAiilJYA52I1yGDQ3OwFUoimAeVh/BsmfKo0BzMRu3Rm0E0A1WgOYjRVnkPxpoj2AGVlpBsmfZloEmJUVZpD8aaNNgJnZrSmEimu9o1t1lrQKMDvJl3SiE+QW0u8aRbfoTGkXYIbK3xWNzpKzVb7rGt2as6VlgHna1UTnyhC1mY5uxlnTOsCM1WNorknUMJ/RjTd/mgiYu8YYmrfoJlsI7QQsRHSqDBPdSouitYCFic6YFtHNskQajbMtbXNb2vzCZtguOdfyCvry5hi2wWbJeZZZzZc517B2NkrOsthKvtgZhxWzTXKGJR9KLHneYaVskQy38Bq+8NmH9bFBMtTyDyKWvwSwLjZHhllH9V7HUsBa2BgZZC2ley3LAatgW2SANdXtNS0LLJwtkX4rq9krWxxYLhsifdZ30LC+JYJlshnSY5XVepULBYtjK6TTWg8X1rpcsCi2QbqsuE6veNFgKWyCtFv3gcK6lw6WwAZIm/VX6PUvIcybzY8WWyjPW1hGmDFbH422Upu3spwwS7Y9mmyoLm9oUWFubHrUbevAYFtLC3Niw6NmcxV5cwsMM2G7o2KLhwRbXGaYAVsdZRutxRtdbIhloyO13YOB7S45xLHJcbLtKrztpYcINjgKWy/BW19+mJztjYz6qw1gYrY2jtTeA80AE7KxsWfnP6clYDo2NZ7a8y/RGDAVWxp2+yu0B0zEdrZ56m2dNoFJ2Mq2TrFtolVgCjaybVNp22gZGJ9NbNNU2Q4aB8ZmC9swu/ndtA+MzPa1XeprL00Eo7J5bZUd/CG0EozJxrVRKutAGgrGY9vaJLv2w2krGI0ta4PU1PNoLxiJ7Wp7FNRzaTEYh81qa1TTS2g1GIONamNU0gtpOLg+29Sm2JW/nLaDq7NFbYkaeiuaD67MBrUdduJvSwvCddmctkL1vAatCNdkY9oIpfM6tCNckW1pE9TN69GWcDW2pC1QM69Kc8KV2JDWz077tWlRuA6b0eqpliPQqHANtqKVs7s+Du0KV2AbWjd1cjSaFm7NJrRmdtTHpHXhtmxA66VCjk0Lw+3YfFZLeRyfNoZbsfWslNo4De0Mt2DbWSd1cTKaGi5m01kjO+ZT0tpwKRvOCqmIE9PgcBnbzerYJZ+eNoeL2GpWRi2Mod3hAraZdVEIo2h5OJ9NZk1UwUhaH85lg1kRFTCYFQDnsb2shl3weNYBnMXWshZq3yxYDXAGG8s62PmeC2sChrOprIKqNyNWBgxlS1kBu93zYn3AQLaTxVPv5sc6gUFsJUun2M2RtQJD2EiWTaWbK2sG+tlEFk2VmzErB/rYQhbMbva8WT/Qw/axXOrb7FlF0MnmsVR2sJfAWoIuNo5lUtmWwpqCdjaNRVLWlsO6gla2jAVS05bF+oIWtovlUc8WxyqDRjaLpbFDvUTWGjSxUSyMSrZQVhzU2SYWxa70cll3UGOLWBI1bNGsPqiwQSyHneilswahzOawFKrXGliLkLIxLITStQ7WIyRsC4ugbq2HdQkFW8ISqFmrYnVCxoYwf3aa18YahSObweypVitkpcKerWDm7C6vk/UKTyXQ3KlTq2XVggSaNTvKa2btgg1gvlSotbOG2Trdf7aUp/Wzjtk4vX+m1KZtsJ7ZNH1/ntSlzbCq2TBdf47sGG+Jtc126fgzpCJtjBXOVun3s2OXeHusczZKr58ZtWibrHc2SZ+fF4Voq6x5tkiXnxNVaMusfbZHh58RFWjjdAC2Rn+fDbvA6ANsjN4+F2oPT3UDNkZnnwc7vxzpCWyJrj4Lqg4FnYHt0NNnwG4vKf2BzdDPw6k3VOkTbIReHk2xoU6vYBt08lgqDc30DLZAFw+lytBK52D99PBAdnPpon+wevp3HPWFHroIK6d7R7GDSz+9hHXTuWOoLAyjp7BmunYIZYWh9BVWTM8OoKZwDv2F1dKvp6eecCZdhpXSrW9lN5XoBeVik/WR4aKbBHI64wVUD/qE9pHhopuJrdMFzxVdM3bKxuxFd5CzRDcWm6b/naO++T6chrqxGGF9ZDi9idnQ9wabR01RNuZsHn1kOL2JYDreMHMqLcrGPM2pjwynNxFJrxtifpVF1Zib+fWR4fQmouhz/WZaWlSNGZlpHxlObyKEHtdj1qVF2ZiFWfeR4fQmpqe7dZp9bVE1ws2+jwynNzE1na3DImqLqhFqEX1kOL2JaelqrZZTWlSNKMvpI8PpTUxIR2uzqNqiaIRYVB8ZTm9iMvpZi6XVFkVjekvrI8PpTUxEN2u0xNqiaExriX1kOL2JSehlTZZZXOy4TmmZfWQ4vYkp6GMNllpcFI3pFH3k7v4frzStjo///OaVT/zNyCv9vZsPeeYvR5iw3sQEdLG6hebPQ0VjOqc+8tHLN//81AcNa+PwSmM2XdNoCaQ3MQE9rGqpB0BHisYU0j5yONLZvd6wLsbLhqk+RW9ibDpY1aIDSNGYRKmP7CNg93x9TRyiqeH3VzZqzulNjEz/qlh4ACkaEyj3kcNgW8Ppngdtx0bXNe6Rlt7EuHSvsloAPTh+Pa8+0H8oPJeM898t3vXfvzZG8VA0RlbpI23DcHfbzg9d18hjfXoTo9K7ympHQFkC1bfx93YzTaCHasa4qn2keRhuv3LHvw5h/LNNIogx6Vwl9TG4LIFqteS45zvXBLJax1PrI4esqQ3DvTfFpdgPJ7jeQW9iRDpXquEkUJ5A1fGUbBBujgkkgsbU0EeavhJUvw7h/bd+7+ZXz/7+d+sr7Ndv3XSEz/3sopXdl0Af//SP91312T/4wUWT15sYlb6VargKYZ9Az75c35292fCf+b1bJtBo1IzxtPSR6i7KYQ8lOTn009d2uWe+cfr13X3PyvZmnvnntYG7u/2X03Un0Md/9XLxuc9mn3sIx9PHvFea+Qf1TxRBjEfXSjTs3h62yE/9Re1U834rfu61GSeQFTuOpj5yGIarpEC5rj/8q13qc8UL+wT6T1/Lfv169eKFw4R7LqfrTKBfv9b0ue+lKVO5c0NT79SdGI2eddJUXI4J9JPajuF+t/UP/7xha+03QQKpGWNp7CMNw3CV6xDu7sqeT174xD/LfvmpDx5U0uTBrv9cUlcCffRy5XOP8VaabDk+mweGdSfGomOdNBaXQwL931+rVoL9INz/Nt8EMg43ksY+0nBnnnJ2HK+bPJyJ+c1fv5zG1TGa9uNyP33tler42JBBuK4EOoRLdoLp/T/eFZ9bipkHu3R+9kvScAm57sRIdKxC8+7tIYH+nz+vjobczX9bypL2k83HVz773YcXJ9Bv/no/hc98PSsPv84m2PKFEyVjHM0JdEiOUgyUsuMQUMXLpT8+JNDxh49/Vh27GzII15VAh+B77ofZTz85zUXaBd8rHZS919w5dSdGomMVmovLIYE+KI2cZ7Xhleoea8vJ5ofpcPxNPWi8Fq66t3v6ef9HN0Xpv+UDKoc3fFycV3iuufgYNxlF815KVsaTrlAezLpbDon0Mrm7u9KK32fV6S+HDMJ1JNAhwJIDmgfFZyXdOftWQfZn+5+aJqY7MRL9qtCVQB9VroZ7sN9QKwnUdrI52/ssAuSiBLqbTqH4MtJpB7qpZkQ36Aq1BFDtzjylY5naoUzSm5rCqehSQwbhOhKo9krxi2QG9jP3meKnlkE43Ymx6Fe5lt3bYwJV9w3v5r8sl4vU86VJnBxOPJ+bQP97OoVSHrXd+UXJGENbAlXuzFPuGbVDmY9Pg7p3K6+l0TVoEK4jgWoBVlwekXTn/b7U//py/jkN12If6U6MQ7/KtRSXYwJVxscPg3DlOtN+svl4Pvi5/Rmb99/KUuPMBDpM4GbSHx+Ppm72WXf708u/+av2gyAlYwxtCZSMcO2VD5mrQ7jp2Zbq9dfpMNygQbj2BGoIsKJX3U0j8HA5+Gl2GjNPd2Ic+lWuO4HKwxMPDhtqmkB9J5vzQbnS/UzPSaBXSu/Pp/2TNOqUjNG1JlD5zjzltVm/9OR0qFFd72mfGjQI155A5XNKxd8eenExA/vPe37/Sc/nS9GceboT49CvMrvuBCoPwx2+yV6rFs0nmw+1qXyu+YIESr9Akuyntj+DZqdmXF9bJ8nXS7ZaytchpCftEsf9mVrKnIbhhg3CdSdQJU6Ko6rs8pbjH72yn8bh162DcLoTI9GtMm21JUug0o7sfvt9vrS/2nGyuXqHyuSyqeEJdJpA5Q4wla/el2tGdJuuTnsAlYbhHuxq11R3JVDp+OjUcYYNwrUnUMP7i18VO1SHS2qy/x6n1ZJ5uhOj0K0yfQmUVpWPjmdukwTqONlczZbk2/LDE6hyDNV2Uy8lY1wdCZTuKJTPprQk0PFvawnU3nFaXJRARdYcj+aTDt2WeboTo9CtMn0JlI6sZIMWSQK1n2yuHR1VjpyGJdCpSFVG98u720rGuDoSKLkzT2X467wEKkbCBg7CXZhA+3l8Pj8NlP+v9Vps3Ymx6FaZXUt1KUr8acc2217TLGkvJPXzwacLoYYn0KkSDU2gnZJxfW2dJF8xx1VRua9Acu11Xb3j5FcDDByEO/s80OkkU3aBzSsPs28XHF5uuVuH7sQ4dKtcXwKdztJmYxZJAnWcbK7XgdN2fmkCJdPrSqDoFl2hjgQqrn+sfneseuuMkoZbNGW7OgMH4S67Fi4ffnvvdALo5qe77fc41Z0Yh36V60ug05Wq+WZ7Ki0dJ5vru7ISaLk6Eqi4xKQ2ltWVJQ0JdNzVGToId873gdIrJ49fKLibdaabeX7mL4sL5BroToxDv8q1jLCcSnw+DLffjg+/6U2g5sEUCbRcXQmUD8PVvtVZ7wLvPfsnPzit50oCHXd1GsbQmp1xT4T0KpbD+FuROYevWHcMwulOjES/KvQlUH61QT52LoG2pyuBsmG4+rc6K5cvli5vbLpR+iHC6te2tLjkvnDZ3H7q/0q78vMd12LrToxEvyr0JVBe+/NBuOp5oOaNN+g80E7JGEPLgfJpXTQeSVS+rpx+e7UpgfZT+MOuyxdKeu6NXekrp86yP/Xzr4sPee/4yN+Oa7F1J8agXxWaq8upxGcxUwzCVROoeQBjjGvhhiVQdHuuUlcCHZ+G8Bf1QDjesil/YMfxyRqnCwJqHWc/nWeHDsKd+XygU096cDpOP/3UdtSlOzESHeukJ4Gy6DjelfRQSspXYzdvvfXzwUOezlC9q48EmomuBCpuj17rCcfb1h5uTvvxT/ZPFizdNLC263K3Ow7q0x7+jNTnKy/mnSe736BBOCamY500HgQ9KG+ln/rgQbHBd90T4XSy+aJ7IiRnCi5KIKMmI+kfhmuq4/Wr9dOndNcS6EHLZBq9V510Mv2PXq78Ou0qd9MZOc6hQTimpmMlmspLUuKPX/S4W2yoSQJ1nGx+UNlHfXDa7pMEqtxdJ7nL2CUJpGKMpiuCssOKhjr+cfnxheWnStUSqDsOKroSKHk470H62MTsjXnPqjystUR3YjR6Vqo7gQ7b7P/0tfIDjpvvjZ2cbK48K/lYpWoJ9CCtBqX7YV+YQNFNuVodCZQV9cYTgulzcpMnqzcm0HE6wwbhuhOoFH3Pfbf0xvJzXavPGU8IIMaja6UadnDTEl8ZZak/H6jxZPN7yb8f/vprSYFIEqhUDrI914sTSMkYUUMfKRf1ttsKvP/W/smCu8/8yQ+TXzYn0GE6wwbhehLopi/+H3+8n9qzf/CDyhsP3bfoOw13UMjoTYxI3yqpl5e0xFdGWRqekdp0svk4pvLM1w8PNX05LRBJAmV/9I0Psj969uXLE0jJGFVXBF3J0HvCTUJvYkQ6V1lnAmUnb/PhkVICtZ9srnxf9Zl/05RAxfHV0Sf+z9slUHQzrtr4CdR8ZBTD/gxj0rnKaju4pRJ/jInSA0qLQtF6srlyPvj1xm+kPnyYvv2Zv/zo8gRSMkY2+kFQ+RmrsfQmRqV3VVTLS6nEl79lXv0easvJ5odpOj3zjeZ7Itz4b8Xbn/vhw8sTSMkY3dgR1P7Qp8npTYxL96qqlJdyNSh9bad+J4TGk80Hv/nr/SvPfv2DlrvyHKb30/0542cOp4wvTiAlYwLjRtDg22KPT29iZPpX1W70QZYxKRlTGK2P/OZmZ+L912ZzCKQ3MTYdrG65EbRTMiYyUh8pLq1+veF3DUa9XEFvYnx6WIOlHgYpGdMZp4/kV0Sm30aNSiC9iQnoYk2WGUFKxpRG6SPZjdw+Vb97zuQJpDcxBX2s0RIjSMWY1hh95OO3biLos+W758QkkN7EJPSyFkvLILus01taHxlOb2Iiulmb3ZLqy07JiLCoPjKc3sRk9LNWu8XUl52SEWQ5fWQ4vYkJ6WgddksoMDsVI9Ii+shwehPT0tU6zb6+qBjhZt9HhtObmJrO1mPW9UXFmIVZ95Hh9Camp7v1m2mBUTBmZKZ9ZDi9iRB63BC72RWYnYoxM/PrI8PpTUTR54bZ7eZTYZJ5iW4WEnPqI8PpTUTS6wabR4FRMOZsHn1kOL2JYDreOXY101cKBWPewvrIcHoTs6Hvnau++U4uugnoEd1BzhLdWGya/ncBBYM+oX1kuOhmYut0wVtRKegzWR8ZLrpJIKczAhBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARBDAgEQQwIBEEMCARDjfwBUcwVdfO/HKAAAAABJRU5ErkJggg==" />

<!-- rnb-plot-end -->

<!-- rnb-source-begin eyJkYXRhIjoiYGBgclxuYGBgclxuTkFcbk5BXG5gYGBcbmBgYCJ9 -->

```r
```r
NA
NA

<!-- rnb-source-end -->

<!-- rnb-chunk-end -->


<!-- rnb-text-begin -->



<!-- rnb-text-end -->


<!-- rnb-chunk-begin -->


<!-- rnb-source-begin eyJkYXRhIjoiYGBgclxuYGBgclxuXG5jYXJldDo6Y29uZnVzaW9uTWF0cml4KGdpbmlJbmRleDUkcHJlZCRvYnMsZ2luaUluZGV4NSRwcmVkJHByZWQpXG5gYGBcbmBgYCJ9 -->

```r
```r

caret::confusionMatrix(giniIndex5$pred$obs,giniIndex5$pred$pred)

<!-- rnb-source-end -->

<!-- rnb-output-begin eyJkYXRhIjoiQ29uZnVzaW9uIE1hdHJpeCBhbmQgU3RhdGlzdGljc1xuXG4gICAgICAgICAgIFJlZmVyZW5jZVxuUHJlZGljdGlvbiAgSGlnaCBMb3cgTWVkaXVtIFZlcnlfSGlnaCBWZXJ5X0xvd1xuICBIaWdoICAgICAgICAgMCAgMjMgICAgIDk5ICAgICAgICA0MSAgICAgICAxMVxuICBMb3cgICAgICAgICAgMCAxMTggICAgIDUxICAgICAgICAxNyAgICAgICA4NFxuICBNZWRpdW0gICAgICAgMCAgNjYgICAgMTA3ICAgICAgICAzNiAgICAgICAyOFxuICBWZXJ5X0hpZ2ggICAgMCAgIDQgICAgIDgyICAgICAgIDE2MiAgICAgICAgNFxuICBWZXJ5X0xvdyAgICAgMCAgODUgICAgIDM5ICAgICAgICAxOSAgICAgIDEyMVxuXG5PdmVyYWxsIFN0YXRpc3RpY3NcbiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgXG4gICAgICAgICAgICAgICBBY2N1cmFjeSA6IDAuNDI0NCAgICAgICAgIFxuICAgICAgICAgICAgICAgICA5NSUgQ0kgOiAoMC4zOTYyLCAwLjQ1MylcbiAgICBObyBJbmZvcm1hdGlvbiBSYXRlIDogMC4zMTU4ICAgICAgICAgXG4gICAgUC1WYWx1ZSBbQWNjID4gTklSXSA6IDEuOTc4ZS0xNSAgICAgIFxuICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICBcbiAgICAgICAgICAgICAgICAgIEthcHBhIDogMC4yNjkyICAgICAgICAgXG4gICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIFxuIE1jbmVtYXIncyBUZXN0IFAtVmFsdWUgOiA8IDIuMmUtMTYgICAgICBcblxuU3RhdGlzdGljcyBieSBDbGFzczpcblxuICAgICAgICAgICAgICAgICAgICAgQ2xhc3M6IEhpZ2ggQ2xhc3M6IExvdyBDbGFzczogTWVkaXVtIENsYXNzOiBWZXJ5X0hpZ2hcblNlbnNpdGl2aXR5ICAgICAgICAgICAgICAgICAgIE5BICAgIDAuMzk4NjUgICAgICAgMC4yODMwNyAgICAgICAgICAgMC41ODkxXG5TcGVjaWZpY2l0eSAgICAgICAgICAgICAgIDAuODU0NiAgICAwLjgzMTMwICAgICAgIDAuODQxMjcgICAgICAgICAgIDAuOTAyNFxuUG9zIFByZWQgVmFsdWUgICAgICAgICAgICAgICAgTkEgICAgMC40MzcwNCAgICAgICAwLjQ1MTQ4ICAgICAgICAgICAwLjY0Mjlcbk5lZyBQcmVkIFZhbHVlICAgICAgICAgICAgICAgIE5BICAgIDAuODA3OTggICAgICAgMC43MTc3MSAgICAgICAgICAgMC44ODA0XG5QcmV2YWxlbmNlICAgICAgICAgICAgICAgIDAuMDAwMCAgICAwLjI0NzI4ICAgICAgIDAuMzE1NzkgICAgICAgICAgIDAuMjI5N1xuRGV0ZWN0aW9uIFJhdGUgICAgICAgICAgICAwLjAwMDAgICAgMC4wOTg1OCAgICAgICAwLjA4OTM5ICAgICAgICAgICAwLjEzNTNcbkRldGVjdGlvbiBQcmV2YWxlbmNlICAgICAgMC4xNDU0ICAgIDAuMjI1NTYgICAgICAgMC4xOTc5OSAgICAgICAgICAgMC4yMTA1XG5CYWxhbmNlZCBBY2N1cmFjeSAgICAgICAgICAgICBOQSAgICAwLjYxNDk3ICAgICAgIDAuNTYyMTcgICAgICAgICAgIDAuNzQ1N1xuICAgICAgICAgICAgICAgICAgICAgQ2xhc3M6IFZlcnlfTG93XG5TZW5zaXRpdml0eSAgICAgICAgICAgICAgICAgICAwLjQ4NzlcblNwZWNpZmljaXR5ICAgICAgICAgICAgICAgICAgIDAuODQ5M1xuUG9zIFByZWQgVmFsdWUgICAgICAgICAgICAgICAgMC40NTgzXG5OZWcgUHJlZCBWYWx1ZSAgICAgICAgICAgICAgICAwLjg2MzlcblByZXZhbGVuY2UgICAgICAgICAgICAgICAgICAgIDAuMjA3MlxuRGV0ZWN0aW9uIFJhdGUgICAgICAgICAgICAgICAgMC4xMDExXG5EZXRlY3Rpb24gUHJldmFsZW5jZSAgICAgICAgICAwLjIyMDZcbkJhbGFuY2VkIEFjY3VyYWN5ICAgICAgICAgICAgIDAuNjY4NlxuIn0= -->

Confusion Matrix and Statistics

       Reference

Prediction High Low Medium Very_High Very_Low High 0 23 99 41 11 Low 0 118 51 17 84 Medium 0 66 107 36 28 Very_High 0 4 82 162 4 Very_Low 0 85 39 19 121

Overall Statistics

           Accuracy : 0.4244         
             95% CI : (0.3962, 0.453)
No Information Rate : 0.3158         
P-Value [Acc > NIR] : 1.978e-15      
                                     
              Kappa : 0.2692         
                                     

Mcnemar’s Test P-Value : < 2.2e-16

Statistics by Class:

                 Class: High Class: Low Class: Medium Class: Very_High

Sensitivity NA 0.39865 0.28307 0.5891 Specificity 0.8546 0.83130 0.84127 0.9024 Pos Pred Value NA 0.43704 0.45148 0.6429 Neg Pred Value NA 0.80798 0.71771 0.8804 Prevalence 0.0000 0.24728 0.31579 0.2297 Detection Rate 0.0000 0.09858 0.08939 0.1353 Detection Prevalence 0.1454 0.22556 0.19799 0.2105 Balanced Accuracy NA 0.61497 0.56217 0.7457 Class: Very_Low Sensitivity 0.4879 Specificity 0.8493 Pos Pred Value 0.4583 Neg Pred Value 0.8639 Prevalence 0.2072 Detection Rate 0.1011 Detection Prevalence 0.2206 Balanced Accuracy 0.6686




<!-- rnb-output-end -->

<!-- rnb-chunk-end -->


<!-- rnb-text-begin -->


### Gain ratio

The gain ratio, a normalized measure of information gain, is calculated by dividing information gain by the split information. The attribute that yields the highest gain ratio is chosen as the splitting attribute. The C4.5 algorithm employs the gain ratio.

The J48 is the Java-based open-source implementation of the C4.5 algorithm, and it is included in the Weka package. This implementation allows users to conveniently apply the C4.5 decision tree.

#### 10 Folds


<!-- rnb-text-end -->


<!-- rnb-chunk-begin -->


<!-- rnb-source-begin eyJkYXRhIjoiYGBgclxuc2V0LnNlZWQoMTApXG5jdHJsIDwtIHRyYWluQ29udHJvbChtZXRob2QgPSBcImN2XCIsIG51bWJlciA9IDEwLCByZXR1cm5SZXNhbXA9XCJhbGxcIiwgc2F2ZVByZWRpY3Rpb25zPVwiZmluYWxcIilcbmdhaW5SYXRpbzEwIDwtIHRyYWluKHNhbGFyeV9pbl91c2QgfiAuLCBkYXRhID0gYmFsYW5jZWRfZGF0YXNldCwgbWV0aG9kID0gXCJKNDhcIix0ckNvbnRyb2wgPSBjdHJsKVxucGxvdChnYWluUmF0aW8xMCRmaW5hbE1vZGVsKVxuYGBgIn0= -->

```r
set.seed(10)
ctrl <- trainControl(method = "cv", number = 10, returnResamp="all", savePredictions="final")
gainRatio10 <- train(salary_in_usd ~ ., data = balanced_dataset, method = "J48",trControl = ctrl)
plot(gainRatio10$finalModel)

gainRatio10cm = caret::confusionMatrix(gainRatio10$pred$obs, gainRatio10$pred$pred)

gainRatio10cm
Confusion Matrix and Statistics

           Reference
Prediction  High Low Medium Very_High Very_Low
  High       100  18     32        21        3
  Low         28 148     40         3       51
  Medium      53  49    115        14        6
  Very_High   28   6     18       194        6
  Very_Low     2  39     11         5      207

Overall Statistics
                                          
               Accuracy : 0.6383          
                 95% CI : (0.6103, 0.6655)
    No Information Rate : 0.2281          
    P-Value [Acc > NIR] : <2e-16          
                                          
                  Kappa : 0.5465          
                                          
 Mcnemar's Test P-Value : 0.167           

Statistics by Class:

                     Class: High Class: Low
Sensitivity              0.47393     0.5692
Specificity              0.92495     0.8698
Pos Pred Value           0.57471     0.5481
Neg Pred Value           0.89150     0.8792
Prevalence               0.17627     0.2172
Detection Rate           0.08354     0.1236
Detection Prevalence     0.14536     0.2256
Balanced Accuracy        0.69944     0.7195
                     Class: Medium
Sensitivity                0.53241
Specificity                0.87564
Pos Pred Value             0.48523
Neg Pred Value             0.89479
Prevalence                 0.18045
Detection Rate             0.09607
Detection Prevalence       0.19799
Balanced Accuracy          0.70402
                     Class: Very_High
Sensitivity                    0.8186
Specificity                    0.9396
Pos Pred Value                 0.7698
Neg Pred Value                 0.9545
Prevalence                     0.1980
Detection Rate                 0.1621
Detection Prevalence           0.2105
Balanced Accuracy              0.8791
                     Class: Very_Low
Sensitivity                   0.7582
Specificity                   0.9383
Pos Pred Value                0.7841
Neg Pred Value                0.9293
Prevalence                    0.2281
Detection Rate                0.1729
Detection Prevalence          0.2206
Balanced Accuracy             0.8483

5 Folds

set.seed(10)
ctrl <- trainControl(method = "cv", number = 5, returnResamp="all", savePredictions="final")
gainRatio5 <- train(salary_in_usd ~ ., data = balanced_dataset, method = "J48",trControl = ctrl)
plot(gainRatio5$finalModel)

```r
set.seed(10)
ctrl <- trainControl(method = \cv\, number = 10, returnResamp=\all\, savePredictions=\final\)
gainRatio10 <- train(salary_in_usd ~ ., data = balanced_dataset, method = \J48\,trControl = ctrl)
plot(gainRatio10$finalModel)

<!-- rnb-source-end -->

<!-- rnb-plot-begin -->

<img src="data:image/png;base64," />

<!-- rnb-plot-end -->

<!-- rnb-chunk-end -->


<!-- rnb-text-begin -->


#### 3 Folds


<!-- rnb-text-end -->


<!-- rnb-chunk-begin -->


<!-- rnb-source-begin eyJkYXRhIjoiYGBgclxuc2V0LnNlZWQoMTApXG5jdHJsIDwtIHRyYWluQ29udHJvbChtZXRob2QgPSBcImN2XCIsIG51bWJlciA9IDMsIHJldHVyblJlc2FtcD1cImFsbFwiLCBzYXZlUHJlZGljdGlvbnM9XCJmaW5hbFwiKVxuZ2FpblJhdGlvMyA8LSB0cmFpbihzYWxhcnlfaW5fdXNkIH4gLiwgZGF0YSA9IGJhbGFuY2VkX2RhdGFzZXQsIG1ldGhvZCA9IFwiSjQ4XCIsdHJDb250cm9sID0gY3RybClcbnBsb3QoZ2FpblJhdGlvMyRmaW5hbE1vZGVsKVxuYGBgIn0= -->

```r
set.seed(10)
ctrl <- trainControl(method = "cv", number = 3, returnResamp="all", savePredictions="final")
gainRatio3 <- train(salary_in_usd ~ ., data = balanced_dataset, method = "J48",trControl = ctrl)
plot(gainRatio3$finalModel)

gainRatio3cm=caret::confusionMatrix(gainRatio3$pred$obs, gainRatio3$pred$pred)

gainRatio3cm
Confusion Matrix and Statistics

           Reference
Prediction  High Low Medium Very_High Very_Low
  High        94  18     39        19        4
  Low         25 129     39         3       74
  Medium      47  47    110        19       14
  Very_High   14   4     27       200        7
  Very_Low     3  32      9        12      208

Overall Statistics
                                          
               Accuracy : 0.619           
                 95% CI : (0.5909, 0.6467)
    No Information Rate : 0.2565          
    P-Value [Acc > NIR] : < 2.2e-16       
                                          
                  Kappa : 0.5216          
                                          
 Mcnemar's Test P-Value : 0.007322        

Statistics by Class:

                     Class: High Class: Low
Sensitivity              0.51366     0.5609
Specificity              0.92110     0.8542
Pos Pred Value           0.54023     0.4778
Neg Pred Value           0.91300     0.8910
Prevalence               0.15288     0.1921
Detection Rate           0.07853     0.1078
Detection Prevalence     0.14536     0.2256
Balanced Accuracy        0.71738     0.7075
                     Class: Medium
Sensitivity                 0.4911
Specificity                 0.8695
Pos Pred Value              0.4641
Neg Pred Value              0.8812
Prevalence                  0.1871
Detection Rate              0.0919
Detection Prevalence        0.1980
Balanced Accuracy           0.6803
                     Class: Very_High
Sensitivity                    0.7905
Specificity                    0.9449
Pos Pred Value                 0.7937
Neg Pred Value                 0.9439
Prevalence                     0.2114
Detection Rate                 0.1671
Detection Prevalence           0.2105
Balanced Accuracy              0.8677
                     Class: Very_Low
Sensitivity                   0.6775
Specificity                   0.9371
Pos Pred Value                0.7879
Neg Pred Value                0.8939
Prevalence                    0.2565
Detection Rate                0.1738
Detection Prevalence          0.2206
Balanced Accuracy             0.8073

analasys of the gain ratio classification

all 3 trees seem to have the same structure that is

the attribute that was first selected at the node is the experience level, it has divided the tree into : right subtree : SE(Senior level) EX(Executive level) left subtree : EN(Entry-level) MI(Mid level)

Each of these subtrees further refines the classification based on the attribute “employee residence.” However, there are different criteria for splitting in the right and left subtrees:

In the Right Subtree:

The split is based on whether the tuple has the value “Latin America & Caribbean.” In the Left Subtree:

If the experience level is 1, the tree further partitions based on whether the tuple has the value “North America.” If the experience level is 2, the split is based on “employee residence” being “Latin America & Caribbean.”

The decision tree continues to select the most appropriate attributes for splitting at each node, progressively refining the decision process until it reaches the leaves, where final class labels are assigned to the instances.

rbind("10 Folds"=macro(gainRatio10cm), "5 Folds"=macro(gainRatio5cm), "3 Folds"=macro(gainRatio3cm)  )

Based on the evaluation metrics of average Sensitivity,Precision ,Specificity, and Accuracy, it is evident that the gain ratio model, built using a 10-fold cross-validation approach, exhibits superior performance compared to the other two models. However, it’s worth noting that the difference in performance between the models is relatively small. Notably, as the number of folds decreases, a corresponding decline in the model’s performance becomes apparent.

Information gain

Information Gain is a metric used to decide which attribute to choose for splitting the data at each node in the decision tree. For a given dataset, the Information Gain of an attribute is calculated by comparing the entropy before and after the dataset is split based on that attribute. The attribute with the highest Information Gain is chosen as the splitting attribute.

10 Folds

set.seed(10)
ctrl <- trainControl(method = "cv", number = 10, returnResamp="all", savePredictions="final")


infoGain10 <- train(salary_in_usd ~ ., data = balanced_dataset, method = "C5.0",trControl = ctrl)

c5model <- C5.0(salary_in_usd ~ .,
                       data = data_balanced,
                       trials = infoGain10$bestTune$trials, 
                       rules = FALSE,
                       control = C5.0Control(winnow = infoGain10$bestTune$winnow))
plot(c5model)
caret::confusionMatrix(infoGain10$pred$obs, infoGain10$pred$pred)

5 Folds

set.seed(10)
ctrl <- trainControl(method = "cv", number = 5, returnResamp="all", savePredictions="final")


infoGain5 <- train(salary_in_usd ~ ., data = balanced_dataset, method = "C5.0",trControl = ctrl)

c5model <- C5.0(salary_in_usd ~ .,
                       data = data_balanced,
                       trials = infoGain5$bestTune$trials, 
                       rules = FALSE,
                       control = C5.0Control(winnow = infoGain5$bestTune$winnow))
plot(c5model)
caret::confusionMatrix(infoGain5$pred$obs, infoGain5$pred$pred)

3 Folds

set.seed(10)
ctrl <- trainControl(method = "cv", number = 3, returnResamp="all", savePredictions="final")


infoGain3 <- train(salary_in_usd ~ ., data = balanced_dataset, method = "C5.0",trControl = ctrl)

c5model <- C5.0(salary_in_usd ~ .,
                       data = data_balanced,
                       trials = infoGain3$bestTune$trials, 
                       rules = FALSE,
                       control = C5.0Control(winnow = infoGain3$bestTune$winnow))
plot(c5model)
caret::confusionMatrix(infoGain3$pred$obs, infoGain3$pred$pred)

Clustering

1- prepreocessing

we encoding the rest of factor columns to transformed into numeric types before clustering

```r
caret::confusionMatrix(infoGain3$pred$obs, infoGain3$pred$pred)

<!-- rnb-source-end -->

<!-- rnb-output-begin eyJkYXRhIjoiQ29uZnVzaW9uIE1hdHJpeCBhbmQgU3RhdGlzdGljc1xuXG4gICAgICAgICAgIFJlZmVyZW5jZVxuUHJlZGljdGlvbiAgSGlnaCBMb3cgTWVkaXVtIFZlcnlfSGlnaCBWZXJ5X0xvd1xuICBIaWdoICAgICAgICA4NSAgMjUgICAgIDQwICAgICAgICAxOSAgICAgICAgNVxuICBMb3cgICAgICAgICAxOSAxMzcgICAgIDM3ICAgICAgICAgNSAgICAgICA3MlxuICBNZWRpdW0gICAgICA1NSAgNjQgICAgIDkzICAgICAgICAxMSAgICAgICAxNFxuICBWZXJ5X0hpZ2ggICAxNiAgIDggICAgIDIyICAgICAgIDE5NyAgICAgICAgOVxuICBWZXJ5X0xvdyAgICAgNCAgNDMgICAgIDExICAgICAgICAgOSAgICAgIDE5N1xuXG5PdmVyYWxsIFN0YXRpc3RpY3NcbiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIFxuICAgICAgICAgICAgICAgQWNjdXJhY3kgOiAwLjU5MjMgICAgICAgICAgXG4gICAgICAgICAgICAgICAgIDk1JSBDSSA6ICgwLjU2MzksIDAuNjIwMylcbiAgICBObyBJbmZvcm1hdGlvbiBSYXRlIDogMC4yNDgxICAgICAgICAgIFxuICAgIFAtVmFsdWUgW0FjYyA+IE5JUl0gOiA8IDJlLTE2ICAgICAgICAgXG4gICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICBcbiAgICAgICAgICAgICAgICAgIEthcHBhIDogMC40ODc0ICAgICAgICAgIFxuICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgXG4gTWNuZW1hcidzIFRlc3QgUC1WYWx1ZSA6IDAuMDExNDkgICAgICAgICBcblxuU3RhdGlzdGljcyBieSBDbGFzczpcblxuICAgICAgICAgICAgICAgICAgICAgQ2xhc3M6IEhpZ2ggQ2xhc3M6IExvdyBDbGFzczogTWVkaXVtIENsYXNzOiBWZXJ5X0hpZ2hcblNlbnNpdGl2aXR5ICAgICAgICAgICAgICAwLjQ3NDg2ICAgICAwLjQ5NDYgICAgICAgMC40NTgxMyAgICAgICAgICAgMC44MTc0XG5TcGVjaWZpY2l0eSAgICAgICAgICAgICAgMC45MTI1NyAgICAgMC44NTU0ICAgICAgIDAuODU1MTMgICAgICAgICAgIDAuOTQyNVxuUG9zIFByZWQgVmFsdWUgICAgICAgICAgIDAuNDg4NTEgICAgIDAuNTA3NCAgICAgICAwLjM5MjQxICAgICAgICAgICAwLjc4MTdcbk5lZyBQcmVkIFZhbHVlICAgICAgICAgICAwLjkwODExICAgICAwLjg0OTAgICAgICAgMC44ODU0MiAgICAgICAgICAgMC45NTM0XG5QcmV2YWxlbmNlICAgICAgICAgICAgICAgMC4xNDk1NCAgICAgMC4yMzE0ICAgICAgIDAuMTY5NTkgICAgICAgICAgIDAuMjAxM1xuRGV0ZWN0aW9uIFJhdGUgICAgICAgICAgIDAuMDcxMDEgICAgIDAuMTE0NSAgICAgICAwLjA3NzY5ICAgICAgICAgICAwLjE2NDZcbkRldGVjdGlvbiBQcmV2YWxlbmNlICAgICAwLjE0NTM2ICAgICAwLjIyNTYgICAgICAgMC4xOTc5OSAgICAgICAgICAgMC4yMTA1XG5CYWxhbmNlZCBBY2N1cmFjeSAgICAgICAgMC42OTM3MiAgICAgMC42NzUwICAgICAgIDAuNjU2NjMgICAgICAgICAgIDAuODc5OVxuICAgICAgICAgICAgICAgICAgICAgQ2xhc3M6IFZlcnlfTG93XG5TZW5zaXRpdml0eSAgICAgICAgICAgICAgICAgICAwLjY2MzNcblNwZWNpZmljaXR5ICAgICAgICAgICAgICAgICAgIDAuOTI1NlxuUG9zIFByZWQgVmFsdWUgICAgICAgICAgICAgICAgMC43NDYyXG5OZWcgUHJlZCBWYWx1ZSAgICAgICAgICAgICAgICAwLjg5MjhcblByZXZhbGVuY2UgICAgICAgICAgICAgICAgICAgIDAuMjQ4MVxuRGV0ZWN0aW9uIFJhdGUgICAgICAgICAgICAgICAgMC4xNjQ2XG5EZXRlY3Rpb24gUHJldmFsZW5jZSAgICAgICAgICAwLjIyMDZcbkJhbGFuY2VkIEFjY3VyYWN5ICAgICAgICAgICAgIDAuNzk0NFxuIn0= -->

Confusion Matrix and Statistics

       Reference

Prediction High Low Medium Very_High Very_Low High 85 25 40 19 5 Low 19 137 37 5 72 Medium 55 64 93 11 14 Very_High 16 8 22 197 9 Very_Low 4 43 11 9 197

Overall Statistics

           Accuracy : 0.5923          
             95% CI : (0.5639, 0.6203)
No Information Rate : 0.2481          
P-Value [Acc > NIR] : < 2e-16         
                                      
              Kappa : 0.4874          
                                      

Mcnemar’s Test P-Value : 0.01149

Statistics by Class:

                 Class: High Class: Low Class: Medium Class: Very_High

Sensitivity 0.47486 0.4946 0.45813 0.8174 Specificity 0.91257 0.8554 0.85513 0.9425 Pos Pred Value 0.48851 0.5074 0.39241 0.7817 Neg Pred Value 0.90811 0.8490 0.88542 0.9534 Prevalence 0.14954 0.2314 0.16959 0.2013 Detection Rate 0.07101 0.1145 0.07769 0.1646 Detection Prevalence 0.14536 0.2256 0.19799 0.2105 Balanced Accuracy 0.69372 0.6750 0.65663 0.8799 Class: Very_Low Sensitivity 0.6633 Specificity 0.9256 Pos Pred Value 0.7462 Neg Pred Value 0.8928 Prevalence 0.2481 Detection Rate 0.1646 Detection Prevalence 0.2206 Balanced Accuracy 0.7944




<!-- rnb-output-end -->

<!-- rnb-chunk-end -->


<!-- rnb-text-begin -->



### 2- K-means


#### k-means clustering 1 


<!-- rnb-text-end -->


<!-- rnb-chunk-begin -->


<!-- rnb-source-begin eyJkYXRhIjoiYGBgclxuIyBydW4gay1tZWFucyBjbHVzdGVyaW5nIHRvIGZpbmQgMiBjbHVzdGVyc1xuc2V0LnNlZWQoODk1MylcbmttZWFucy5yZXN1bHQgPC0ga21lYW5zKGRhdGFzZXQzLCAyKVxuXG4jIHByaW50IHRoZSBjbHVzdGVybmcgcmVzdWx0XG5rbWVhbnMucmVzdWx0XG5cbiMgdmlzdWFsaXplIGNsdXN0ZXJpbmdcbmZ2aXpfY2x1c3RlcihrbWVhbnMucmVzdWx0LCBkYXRhID0gZGF0YXNldDMpXG5cblxuI2F2ZXJhZ2Ugc2lsaG91ZXR0ZSBmb3IgZWFjaCBjbHVzdGVyc1xuYXZnX3NpbCA8LSBzaWxob3VldHRlKGttZWFucy5yZXN1bHQkY2x1c3RlcixkaXN0KGRhdGFzZXQzKSkgXG5mdml6X3NpbGhvdWV0dGUoYXZnX3NpbClcblxuI1dpdGhpbi1jbHVzdGVyIHN1bSBvZiBzcXVhcmVzIHdzcyBcbndzcyA8LSBrbWVhbnMucmVzdWx0JHRvdC53aXRoaW5zc1xucHJpbnQod3NzKVxuXG4jQkN1YmVkXG5jbHVzdGVyX2Fzc2lnbm1lbnRzIDwtIGMoa21lYW5zLnJlc3VsdCRjbHVzdGVyKVxuXG5ncm91bmRfdHJ1dGhfbGFiZWxzIDwtIGMoY2xhc3NMYWJlbClcblxuZGF0YSA8LSBkYXRhLmZyYW1lKGNsdXN0ZXIgPSBjbHVzdGVyX2Fzc2lnbm1lbnRzLCBsYWJlbCA9IGdyb3VuZF90cnV0aF9sYWJlbHMpXG5cbiMgRnVuY3Rpb24gdG8gY2FsY3VsYXRlIEJDdWJlZCBwcmVjaXNpb24gYW5kIHJlY2FsbFxuY2FsY3VsYXRlX2JjdWJlZF9tZXRyaWNzIDwtIGZ1bmN0aW9uKGRhdGEpIHtcbiAgbiA8LSBucm93KGRhdGEpXG4gIHByZWNpc2lvbl9zdW0gPC0gMFxuICByZWNhbGxfc3VtIDwtIDBcblxuICBmb3IgKGkgaW4gMTpuKSB7XG4gICAgY2x1c3RlciA8LSBkYXRhJGNsdXN0ZXJbaV1cbiAgICBsYWJlbCA8LSBkYXRhJGxhYmVsW2ldXG4gICAgXG4jIENvdW50IHRoZSBudW1iZXIgb2YgaXRlbXMgZnJvbSB0aGUgc2FtZSBjYXRlZ29yeSB3aXRoaW4gdGhlIHNhbWUgY2x1c3Rlclxuc2FtZV9jYXRlZ29yeV9zYW1lX2NsdXN0ZXIgPC0gc3VtKGRhdGEkbGFiZWxbZGF0YSRjbHVzdGVyID09IGNsdXN0ZXJdID09IGxhYmVsKVxuICAgIFxuIyBDb3VudCB0aGUgdG90YWwgbnVtYmVyIG9mIGl0ZW1zIGluIHRoZSBzYW1lIGNsdXN0ZXJcbnRvdGFsX3NhbWVfY2x1c3RlciA8LSBzdW0oZGF0YSRjbHVzdGVyID09IGNsdXN0ZXIpXG4gICAgXG4jIENvdW50IHRoZSB0b3RhbCBudW1iZXIgb2YgaXRlbXMgd2l0aCB0aGUgc2FtZSBjYXRlZ29yeVxudG90YWxfc2FtZV9jYXRlZ29yeSA8LSBzdW0oZGF0YSRsYWJlbCA9PSBsYWJlbClcbiAgICBcbiMgQ2FsY3VsYXRlIHByZWNpc2lvbiBhbmQgcmVjYWxsIGZvciB0aGUgY3VycmVudCBpdGVtIGFuZCBhZGQgdGhlbSB0byB0aGUgc3Vtc1xucHJlY2lzaW9uX3N1bSA8LSBwcmVjaXNpb25fc3VtICsgc2FtZV9jYXRlZ29yeV9zYW1lX2NsdXN0ZXIgL3RvdGFsX3NhbWVfY2x1c3RlclxucmVjYWxsX3N1bSA8LSByZWNhbGxfc3VtICsgc2FtZV9jYXRlZ29yeV9zYW1lX2NsdXN0ZXIgLyB0b3RhbF9zYW1lX2NhdGVnb3J5XG4gIH1cblxuICAjIENhbGN1bGF0ZSBhdmVyYWdlIHByZWNpc2lvbiBhbmQgcmVjYWxsXG4gIHByZWNpc2lvbiA8LSBwcmVjaXNpb25fc3VtIC8gblxuICByZWNhbGwgPC0gcmVjYWxsX3N1bSAvIG5cblxuICByZXR1cm4obGlzdChwcmVjaXNpb24gPSBwcmVjaXNpb24sIHJlY2FsbCA9IHJlY2FsbCkpXG59XG5cbiMgQ2FsY3VsYXRlIEJDdWJlZCBwcmVjaXNpb24gYW5kIHJlY2FsbFxubWV0cmljcyA8LSBjYWxjdWxhdGVfYmN1YmVkX21ldHJpY3MoZGF0YSlcblxuIyBFeHRyYWN0IHByZWNpc2lvbiBhbmQgcmVjYWxsIGZyb20gdGhlIG1ldHJpY3NcbnByZWNpc2lvbiA8LSBtZXRyaWNzJHByZWNpc2lvblxucmVjYWxsIDwtIG1ldHJpY3MkcmVjYWxsXG5cbiMgUHJpbnQgdGhlIHJlc3VsdHNcbmNhdChcIkJDdWJlZCBQcmVjaXNpb246XCIsIHByZWNpc2lvbiwgXCJcXG5cIilcbmNhdChcIkJDdWJlZCBSZWNhbGw6XCIsIHJlY2FsbCwgXCJcXG5cIilcbmBgYCJ9 -->

```r
# run k-means clustering to find 2 clusters
set.seed(8953)
kmeans.result <- kmeans(dataset3, 2)

# print the clusterng result
kmeans.result

# visualize clustering
fviz_cluster(kmeans.result, data = dataset3)


#average silhouette for each clusters
avg_sil <- silhouette(kmeans.result$cluster,dist(dataset3)) 
fviz_silhouette(avg_sil)

#Within-cluster sum of squares wss 
wss <- kmeans.result$tot.withinss
print(wss)

#BCubed
cluster_assignments <- c(kmeans.result$cluster)

ground_truth_labels <- c(classLabel)

data <- data.frame(cluster = cluster_assignments, label = ground_truth_labels)

# Function to calculate BCubed precision and recall
calculate_bcubed_metrics <- function(data) {
  n <- nrow(data)
  precision_sum <- 0
  recall_sum <- 0

  for (i in 1:n) {
    cluster <- data$cluster[i]
    label <- data$label[i]
    
# Count the number of items from the same category within the same cluster
same_category_same_cluster <- sum(data$label[data$cluster == cluster] == label)
    
# Count the total number of items in the same cluster
total_same_cluster <- sum(data$cluster == cluster)
    
# Count the total number of items with the same category
total_same_category <- sum(data$label == label)
    
# Calculate precision and recall for the current item and add them to the sums
precision_sum <- precision_sum + same_category_same_cluster /total_same_cluster
recall_sum <- recall_sum + same_category_same_cluster / total_same_category
  }

  # Calculate average precision and recall
  precision <- precision_sum / n
  recall <- recall_sum / n

  return(list(precision = precision, recall = recall))
}

# Calculate BCubed precision and recall
metrics <- calculate_bcubed_metrics(data)

# Extract precision and recall from the metrics
precision <- metrics$precision
recall <- metrics$recall

# Print the results
cat("BCubed Precision:", precision, "\n")
cat("BCubed Recall:", recall, "\n")

k-means clustering 2

# run k-means clustering to find 3 clusters
set.seed(8953)
kmeans.result <- kmeans(dataset3, 3)

# print the clusterng result
kmeans.result

# visualize clustering
fviz_cluster(kmeans.result, data = dataset3)

#average silhouette for each clusters
avg_sil <- silhouette(kmeans.result$cluster,dist(dataset3)) 
fviz_silhouette(avg_sil)

#Within-cluster sum of squares wss 
wss <- kmeans.result$tot.withinss
print(wss)

#BCubed
cluster_assignments <- c(kmeans.result$cluster)

ground_truth_labels <- c(classLabel)

data <- data.frame(cluster = cluster_assignments, label = ground_truth_labels)

# Function to calculate BCubed precision and recall
calculate_bcubed_metrics <- function(data) {
  n <- nrow(data)
  precision_sum <- 0
  recall_sum <- 0

  for (i in 1:n) {
    cluster <- data$cluster[i]
    label <- data$label[i]
    
# Count the number of items from the same category within the same cluster
same_category_same_cluster <- sum(data$label[data$cluster == cluster] == label)
    
# Count the total number of items in the same cluster
total_same_cluster <- sum(data$cluster == cluster)
    
# Count the total number of items with the same category
total_same_category <- sum(data$label == label)
    
# Calculate precision and recall for the current item and add them to the sums
precision_sum <- precision_sum + same_category_same_cluster /total_same_cluster
recall_sum <- recall_sum + same_category_same_cluster / total_same_category
  }

  # Calculate average precision and recall
  precision <- precision_sum / n
  recall <- recall_sum / n

  return(list(precision = precision, recall = recall))
}

# Calculate BCubed precision and recall
metrics <- calculate_bcubed_metrics(data)

# Extract precision and recall from the metrics
precision <- metrics$precision
recall <- metrics$recall

# Print the results
cat("BCubed Precision:", precision, "\n")
cat("BCubed Recall:", recall, "\n")

k-means clustering 3

# run k-means clustering to find 4 clusters
set.seed(8953)
kmeans.result <- kmeans(dataset3, 4)

# print the clusterng result
kmeans.result

# visualize clustering
fviz_cluster(kmeans.result, data = dataset3)

#average silhouette for each clusters
avg_sil <- silhouette(kmeans.result$cluster,dist(dataset3)) 
fviz_silhouette(avg_sil)

#Within-cluster sum of squares wss 
wss <- kmeans.result$tot.withinss
print(wss)

#BCubed
cluster_assignments <- c(kmeans.result$cluster)

ground_truth_labels <- c(classLabel)

data <- data.frame(cluster = cluster_assignments, label = ground_truth_labels)

# Function to calculate BCubed precision and recall
calculate_bcubed_metrics <- function(data) {
  n <- nrow(data)
  precision_sum <- 0
  recall_sum <- 0

  for (i in 1:n) {
    cluster <- data$cluster[i]
    label <- data$label[i]
    
# Count the number of items from the same category within the same cluster
same_category_same_cluster <- sum(data$label[data$cluster == cluster] == label)
    
# Count the total number of items in the same cluster
total_same_cluster <- sum(data$cluster == cluster)
    
# Count the total number of items with the same category
total_same_category <- sum(data$label == label)
    
# Calculate precision and recall for the current item and add them to the sums
precision_sum <- precision_sum + same_category_same_cluster /total_same_cluster
recall_sum <- recall_sum + same_category_same_cluster / total_same_category
  }

  # Calculate average precision and recall
  precision <- precision_sum / n
  recall <- recall_sum / n

  return(list(precision = precision, recall = recall))
}

# Calculate BCubed precision and recall
metrics <- calculate_bcubed_metrics(data)

# Extract precision and recall from the metrics
precision <- metrics$precision
recall <- metrics$recall

# Print the results
cat("BCubed Precision:", precision, "\n")
cat("BCubed Recall:", recall, "\n")

3- Silhouette method


fviz_nbclust(dataset3, kmeans, method = "silhouette")+
  labs(subtitle = "Silhouette method")

4- Elbow method


fviz_nbclust(dataset3, kmeans, method = "wss") +
  geom_vline(xintercept = 4, linetype = 2)+
  labs(subtitle = "Elbow method")
---
title: "Cybersecurity salaries"
output: html_notebook
---

```{r}

knitr::opts_chunk$set(warning=FALSE)

```

### Needed libraries

```{r}
library(dplyr)
library(countrycode)
library(outliers)
library(caret)
library(cluster)
library(factoextra)
library(NbClust)
library("DMwR")
library("RWeka")
library("C50")
library("rpart")
library("themis")
library(rattle)
library(rpart.plot)
library(RColorBrewer)
```

# phase 1

### Problem statement

Prediction of cyber security employees' salaries based on 11 attributes

1.work_year

2.experience_level

3.employment_type

4.job_title

5.salary

6.salary_currency

7.salary_in_usd

8.employee_residence

9.remote_ratio

10.company_location

11.company_size

### Problem description

We are living in the "information age" or rather the "data age", meaning that everything around us revolves around data. The data has become one of the most valuable assets that a person or an organisation can have, since it has a significant value, losing it will lead to significant damages. Thus, most of the attacks nowadays are directed toward the data. To guard against such damages, organisations have realised the importance of protecting their digital assets, leading them to hire cybersecurity specialists. This made cybersecurity gain popularity among people so there's a growing tendency to study cybersecurity. Consequently this resulted in the emergence of plentiful professionals with various experience levels and skills in this field. As a result, organisations may find it difficult to decide a salary for job candidates solely based on the CV. also, since the attacks improve rapidly, organisations need to hire more employees in the far future to defend against such attacks but it's not an easy matter to predict the future payroll which may hinders some of the organisation's plans. Another issue arises when the decision makers in the organisation aren't fully aware of the trends on salary. Their lack of awareness gives a chance for the competitor organisations to attract their employees to them by offering a better salary that match current trends

### Data mining task

Prediction of the cyber security employees' salary categories (Very Low, Low, , High, Very High) using classification methods.

### Goal

Given the problems we discussed and In order to better understand this field, we decided to analyse a dataset of 1247 cybersecurity employees, containing information such as salary, job title, and experience level. Analysing this dataset can provide insightful predictions regarding the salary range of a cybersecurity employee, which can help in

-   Making better decisions
-   Making recruitment and hiring process easier and more efficient
-   Predicting the future payroll
-   Increasing loyalty
-   Increasing the satisfaction rate
-   Achieving fairness

## Source of data:

<https://www.kaggle.com/datasets/deepcontractor/cyber-security-salaries>

### Reading and viewing dataset

```{r}
dataset= read.csv(url("https://raw.githubusercontent.com/SarahAlhindi/DM_project/main/Data%20Set/salaries_cyber.csv"), header=TRUE)
View(dataset)

```

### Original dataset

we will keep a copy of the original dataset before data preprocessing to use if needed at any time

```{r}
originalDataset= dataset
```

## General information about the dataset:

No. of attributes: 11\
Type of attributes: Ordinal , Nominal, and Numeric\
No. of objects: 1247\
Class label: salary_in_usd

```{r}
ncol(dataset)
nrow(dataset)
names(dataset)
str(dataset)
```

### Attributes' description table

+--------------------+-------------------------------------------------------------+---------------+-----------------------------------------------------------+
| **Attribute Name** | **Description**                                             | **Data Type** | **Possible values**                                       |
+====================+=============================================================+===============+===========================================================+
| work_year          | The year in which salary was recorded                       | Numerical     | 2020 to 2022                                              |
+--------------------+-------------------------------------------------------------+---------------+-----------------------------------------------------------+
| experience_level   | Expertise level of the employee                             | Ordinal       | En "Entry level"\                                         |
|                    |                                                             |               | MI "Mid level"\                                           |
|                    |                                                             |               | SE "Senior level"\                                        |
|                    |                                                             |               | EX "Executive level"                                      |
+--------------------+-------------------------------------------------------------+---------------+-----------------------------------------------------------+
| employment_type    | The nature or category of employee's engagement in the job  | Nominal       | PT "Part time"\                                           |
|                    |                                                             |               | FT "Full time"\                                           |
|                    |                                                             |               | CT "Contract\                                             |
|                    |                                                             |               | FL"Freelancer"                                            |
+--------------------+-------------------------------------------------------------+---------------+-----------------------------------------------------------+
| job_title          | The role worked in during the year                          | Nominal       | Different titles.                                         |
|                    |                                                             |               |                                                           |
|                    |                                                             |               | like Security Analyst, security researcher                |
+--------------------+-------------------------------------------------------------+---------------+-----------------------------------------------------------+
| salary             | The total gross salary amount paid                          | Numerical     | 1740-50001566                                             |
+--------------------+-------------------------------------------------------------+---------------+-----------------------------------------------------------+
| salary_currency    | The currency of the salary paid to the employee             | Nominal       | Different currencies according to ISO 4217 currency code. |
|                    |                                                             |               |                                                           |
|                    |                                                             |               | like DE,CA                                                |
+--------------------+-------------------------------------------------------------+---------------+-----------------------------------------------------------+
| salary_in_usd      | The salary paid in United states dollar                     | Numerical     | 2000 to 365596.40                                         |
+--------------------+-------------------------------------------------------------+---------------+-----------------------------------------------------------+
| employee_residence | Employee's primary country of residence                     | Nominal       | Different countries.                                      |
|                    |                                                             |               |                                                           |
|                    |                                                             |               | like US,AE                                                |
+--------------------+-------------------------------------------------------------+---------------+-----------------------------------------------------------+
| remote_ratio       | Percentage of online work by employee in the specified year | Numerical     | 0 "No remote work"\                                       |
|                    |                                                             |               | 50 "Partially remote"\                                    |
|                    |                                                             |               | 100 "Fully remote"                                        |
+--------------------+-------------------------------------------------------------+---------------+-----------------------------------------------------------+
| company_location   | The country of the employer's main office                   | Nominal       | Different countries.                                      |
|                    |                                                             |               |                                                           |
|                    |                                                             |               | like BR,BW                                                |
+--------------------+-------------------------------------------------------------+---------------+-----------------------------------------------------------+
| company_size       | How big/small is the company                                | Ordinal       | S , M or L                                                |
+--------------------+-------------------------------------------------------------+---------------+-----------------------------------------------------------+

# phase 2

### sample of 20 employees from the dataset:

using sample_n(table,size) function and using (set_seed())

```{r}
set.seed(30)
sample=sample_n(dataset,20)
print(sample)
```

### Show the missing value:

if it is FALSE it means no null value,if it is TRUE there is null value. In our dataset there is no null values.

```{r}
is.na(dataset)
sum(is.na(dataset))
```

### Show the Min.,1st Qu.,Median,Mean ,3rd Qu.,Max. for each numeric column
in work_year Most data falls within the years 2021, with some in 2020 and 2022.
in salary Salaries vary widely, with a high average and an exceptionally high maximum.
in salary_in_usd Represents salaries in USD, ranging from a minimum to a maximum.
in remote_ratio Indicates the percentage of remote work, with a median and 3rd quartile at 100%, but a mean slightly below, suggesting some variability.
```{r}
summary(dataset$work_year)
summary(dataset$salary)
summary(dataset$salary_in_usd)
summary(dataset$remote_ratio)
```

### Show the variane of each numeric column
variance is to understand the spread or dispersion of the values in each column. A higher variance indicates that the values are more spread out from the mean and in our dataset the highest attribute is salary, while a lower variance indicates that the values are closer to the mean which in our datas it is work year attribute.
```{r}
var(dataset$work_year)
var(dataset$salary)
var(dataset$salary_in_usd)
var(dataset$remote_ratio)
```

### Visualization of relationship between some pairs of attributes:

Here we used boxplot to see the distribution between salary_in_usd and experience_level We observed that salaries vary depending on the level of experience,they are positively correlated.

```{r}
boxplot(salary_in_usd ~ experience_level, data = dataset , yaxt="n")
labels<- pretty(dataset$salary_in_usd)
labels<- sapply(labels, function(x) format(x, scientific = FALSE))
axis(side = 2, at=pretty(dataset$salary_in_usd), labels = labels )
options(scipen = 999)
```

Here we used boxplot to see the distribution between salary_in_usd and work_year We observed that 2021 salaries were close to each other but in 2022 the gap between them getting bigger.

```{r}
boxplot(salary_in_usd ~ work_year, data = dataset , yaxt="n")
labels<- pretty(dataset$salary_in_usd)
labels<- sapply(labels, function(x) format(x, scientific = FALSE))
axis(side = 2, at=pretty(dataset$salary_in_usd), labels = labels )
options(scipen = 999)
```

Here we used boxplot to see the distribution between salary_in_usd and employment_type We observed that Full Time (FT) offers more salary than the other categories.

```{r}
boxplot(salary_in_usd ~ employment_type, data = dataset , yaxt="n")
labels<- pretty(dataset$salary_in_usd)
labels<- sapply(labels, function(x) format(x, scientific = FALSE))
axis(side = 2, at=pretty(dataset$salary_in_usd), labels = labels )
options(scipen = 999)
```

Here we used boxplot to see the distribution between salary_in_usd and company_size We observed that the larger the company is the higher the salary was.

```{r}
boxplot(salary_in_usd ~ company_size, data = dataset , yaxt="n")
labels<- pretty(dataset$salary_in_usd)
labels<- sapply(labels, function(x) format(x, scientific = FALSE))
axis(side = 2, at=pretty(dataset$salary_in_usd), labels = labels )
options(scipen = 999) 
```

## Data Reduction

### Dimensionality Reduction

The "salary" column gives the same information as "salary_in_usd" it's just a matter of currency exchange, and we will eventually transform all the values in "salary" column to one common currency so we can properly deal with them. To further confirm that the two column are redundant, we will use the latest exchange rate for USD to the desired currency.

we will start by creating a temporary column named "converted_salary" to save the salary that we will get by using the exchange rate to convert the salary_in_usd to the salary with different currencies to compare with "salary" column

```{r}
convertedDataset=dataset


convertedDataset$exchange_rate = factor(convertedDataset$salary_currency, levels=c("USD","BRL","GBP","EUR","INR","CAD","CHF","DKK","SGD","AUD","SEK","MXN","ILS","PLN","NOK","IDR","NZD","HUF","ZAR","TWD","RUB"), labels=c(1/1,1/0.20,1/1.22,1/1.06,1/0.012,1/0.74,1/1.10,1/0.14,1/0.73,1/0.64,1/0.090,1/0.057,1/0.26,1/0.23,1/0.093,1/0.000065,1/0.60,1/0.0027,1/0.053,1/0.031,1/0.010))
convertedDataset$exchange_rate = as.numeric(as.character(convertedDataset$exchange_rate))
convertedDataset$converted_salary = convertedDataset$salary_in_usd*convertedDataset$exchange_rate



set.seed(1)
salary_sample <- sample_n(convertedDataset[,c("salary","converted_salary")],10)

print(salary_sample)
```

as shown in the sample, the two columns are almost identical. This can be proved by correlation coefficient as well.

```{r}
correlation <- cor(convertedDataset$salary , convertedDataset$converted_salary)
print(correlation)
```

The correlation is so high but it hasn't reached 100% possibly due to rounding in the calculations and slight differences in the exchange rate over time.

To make the mining process more effiecent and has an improved quality, we decided to remove the "salary" column.

```{r}
dataset<-dataset[,-c(5)]
```

### Find the outliers and remove them:

We will show outliers with boxPlots and then remove them, to minimize noise and to get better analytical results when applying data mining techniques.

now we show (salary_in_usd) attributes' outliers. we can see that there are many outliers with exceptionally high values, thus we will remove them.

```{r}
boxplot(dataset$salary_in_usd)



OutSalary = outlier(dataset$salary_in_usd, logical =TRUE)
Find_outlier = which(OutSalary ==TRUE, arr.ind = TRUE)
dataset= dataset[-Find_outlier,]

```

now we show (remote_ratio) attributes' outliers. we can see there aren't outliers in remote_ratio, thus we don't need the last step i.e: removing outliers' rows.

```{r}
boxplot(dataset$remote_ratio)

```

now we show (work_year) attributes' outliers. we can see there aren't outliers in work_year, thus we don't need the last step i.e: removing outliers' rows.

```{r}
boxplot(dataset$work_year)

```

### Concept hierarchy generation for nominal data

the columns "company_location" and "employee_residence" have the name of countries for the company and employee respectively. And these attributes can be generalized to higher-level concept that is region to help understand and analyze the dataset better and improve algorithm performance.

We will use the 7 regions as defined in the World Bank Development Indicators. These regions are:

1.  East Asia and Pacific: This region includes countries like China, Australia, Indonesia, Thailand, etc.

2.  Europe and Central Asia: This region includes countries like Germany, UK, Russia, Turkey, etc.

3.  Latin America & Caribbean: This region includes countries like Brazil, Mexico, Argentina, Cuba, etc.

4.  Middle East and North Africa: This region includes countries like Saudi Arabia, Egypt, Iran, Iraq, etc.

5.  North America: This is predominantly United States and Canada.

6.  South Asia: This region includes countries like India, Pakistan, Bangladesh, Sri Lanka, etc.

7.  Sub-Saharan Africa: This region includes countries like Nigeria, South Africa, Ethiopia, Kenya, etc.

Note: UM(The United States Minor Outlying Islands) and AQ(Antarctica) don't belong to any of these regions, thus, they will be used as they are.

```{r}


um=which(dataset$company_location=="UM")
aq=which(dataset$company_location=="AQ")


dataset$company_location <- countrycode(dataset$company_location, "iso2c", "region")
dataset$employee_residence <- countrycode(dataset$employee_residence, "iso2c", "region")

dataset[um,"company_location"]="UM"
dataset[aq,"company_location"]="AQ"

```

Concept hierarchy generation can be done for "job_title" as well to improve interpretation and scalability. Also, most job titles are essentially the same job but with different names, so we can combine them into a higher-level jobs titles such as Architect, Analyst and Engineer etc.

```{r}
## Create the categories based on job rank 
dataset$job_title <- ifelse(grepl("Analyst", dataset$job_title), "Analyst",
                                ifelse(grepl("Architect", dataset$job_title), "Architect",
                                       ifelse(grepl("Engineer", dataset$job_title), "Engineer",
                                              ifelse(grepl("Manager|Officer|Director|Leader", dataset$job_title), "Leadership",
                                                     ifelse(grepl("Consultant|Specialist", dataset$job_title), "Consultant/Specialist",
                                                            ifelse(grepl("Cyber", dataset$job_title), "Cyber Security",
                                                                   "Others"))))))

```

## Encoding categorical data

To deal with columns with character type we are going to encode them, because most machine learning algorithms are designed to work with factors data rather than character data and it improves performance and Interpretability of data as well.

```{r}
dataset$job_title  <- factor(dataset$job_title)

dataset$experience_level = factor(dataset$experience_level, levels=c("EN", "MI", "SE", "EX"), labels=c(1,2,3,4))

dataset$employment_type  <- factor(dataset$employment_type)

dataset$employee_residence  <- factor(dataset$employee_residence)

dataset$company_location  <- factor(dataset$company_location)

dataset$salary_currency  <- factor(dataset$salary_currency)

dataset$job_title  <- factor(dataset$job_title)


dataset$company_size = factor(dataset$company_size, levels=c("S","M","L"), labels=c(1,2,3))


dataset$job_title  <- factor(dataset$job_title)

```

### Discretization of salaray_in_usd attribute

by calculating breaks based on quartiles

```{r}
breaks <- quantile(dataset$salary_in_usd, 
                   probs = c(0, .25, .5, .75, .95, 1), 
                   na.rm = TRUE)


dataset$salary_in_usd <- cut(dataset$salary_in_usd, 
                                       breaks = breaks, 
                                       include.lowest = TRUE, 
                                       labels=c("Very Low", "Low", "Medium", "High", "Very High"))


```

### Normalization:

to change the scale of numeric attributes (remote_ratio and work_year) to a scale of [-1,1] to give them equal weight

```{r}
dataset [, c("work_year" , "remote_ratio")] = scale(dataset [, c("work_year" , "remote_ratio")])
```

## Feature Selection

we will implement feature selection to remove redundant or irrelevant attributes from the data set to get the smallest subset that can help us get the most accurate predictions for our target class(salary_in_usd) and decrease the time that it takes the classifier to process the data.

we will use RFE(Recursive feature elimination) which is a wrapper method for the feature selection. Since the RFE function have multiple control options we need to specify the options that we want. We will choose "Random Forest" because it has high accuracy, can handle categorical data.

```{r}
control <- rfeControl(functions = rfFuncs, 
                      method = "repeatedcv",
                      repeats = 5, 
                      number = 10)
```

First we save the features to be used in the feature selection(every attributes except the class label "salary_in_usd") in variable x, and the class label in variable y. Then split the data to 80% training and 20% test.

```{r}
x <- dataset %>%
  select(-salary_in_usd) %>%
  as.data.frame()

# Target variable
y <- dataset$salary_in_usd

# Training: 80%; Test: 20%
set.seed(2021)
inTrain <- createDataPartition(y, p = .80, list = FALSE)[,1]

x_train <- x[ inTrain, ]
x_test  <- x[-inTrain, ]

y_train <- y[ inTrain]
y_test  <- y[-inTrain]

```

after splitting the data, now we can perform the selection using rfe

```{r}
set.seed(1)
result_rfe1 <- rfe(x = x_train, 
                   y = y_train, 
                   sizes = c(1:9),
                   rfeControl = control)

result_rfe1

predictors(result_rfe1)

```

The results show that all the remaining attributes, except for "employment_type", are selected. This is logical, as 98% of the rows have the value "FT", as shown in the table below. Due to the low variance, we decided to remove this attribute.

```{r}
table(dataset$employment_type)
```

```{r}
dataset<-dataset[,-which(names(dataset)=="employment_type")]
```

# phase 3

```{r}
dataset2= read.csv(url("https://raw.githubusercontent.com/SarahAlhindi/DM_project/main/Data%20Set/preprocessedDataset.csv"), header=TRUE)


char_vars <- sapply(dataset2, is.character)
dataset2[char_vars] <- lapply(dataset2[char_vars], as.factor)

```

## balancing data

To resolve the problem of class imbalance in the dataset, we will use SMOTE() method that oversample the minority class by creating synthetic samples using the existing minority class samples

```{r}
data_balanced <- SMOTE(salary_in_usd ~ ., dataset2, perc.over = 300, perc.under=500, k = 10)
```

## Classification

The goal of all preceding steps is to properly prepare the dataset for the classification phase, which constitutes one of our primary mining objectives. In this section, we will employ various attribute selection methods such as the Gini index, Gain ratio, and information gain to construct a decision tree model. We will thoroughly evaluate its performance, and if it proves effective, it can subsequently be utilized to classify new instances with unknown class labels.

since our dataset is small, we decided to use K-fold Cross-validation. for each attribute selection method we will try different K size (10,5, and 3).


in all this section we will be using train and trainControl functions of caret package to produce decision trees. for Gini index the method will be "rpart" and for Gain ratio it's "j48" as for information gain the method is "C5.0".

the following function will be used to compute average sensitivity and Specificity:

```{r}


macro = function(matrix){
  
  sumSen=0
  
  for (i in 1:5) {
   sumSen = sumSen + matrix$byClass[i,1] 
  }
  
  
  avgSen = sumSen/5
  
  sumSpec=0
  
  for (i in 1:5) {
   sumSpec = sumSpec + matrix$byClass[i,2] 
  }
  avgSpec = sumSpec/5
  
  
  
  
  sumPrec=0
  
  for (i in 1:5) {
   sumPrec = sumPrec + matrix$byClass[i,3] 
  }
  avgPrec = sumPrec/5
  
  
  
  
  avgs = data.frame(Sensitivity=avgSen , Specificity=avgSpec, Precision=avgPrec ,Accuracy= unname( matrix$overall[1]) )
  print(avgs)
  
  
}


```

### Gini index

Gini index measures the impurity of the dataset. The partitioning that yields the most substantial reduction in impurity is selected as the splitting attribute. To apply the Gini index, we will employ the "rpart" method, which utilizes the Gini index as the criteria for splitting.

##### 10 Folds

```{r}
set.seed(10)
ctrl <- trainControl(method = "cv", number = 10, returnResamp="all", savePredictions="final")

giniIndex10 <- train(salary_in_usd ~ ., data = balanced_dataset, method = "rpart",trControl = ctrl)

prp(giniIndex10$finalModel, box.palette = "Reds", tweak = 1.2, varlen = 20)




```

```{r}

caret::confusionMatrix(giniIndex10$pred$obs,giniIndex10$pred$pred)

```

##### 5 Folds

```{r}
set.seed(10)
ctrl <- trainControl(method = "cv", number = 5, returnResamp="all", savePredictions="final")

giniIndex5 <- train(salary_in_usd ~ ., data = balanced_dataset, method = "rpart",trControl = ctrl)

prp(giniIndex5$finalModel, box.palette = "Reds", tweak = 1.2, varlen = 20)


```

```{r}

caret::confusionMatrix(giniIndex5$pred$obs,giniIndex5$pred$pred)

```

##### 3 Folds

```{r}
set.seed(10)
ctrl <- trainControl(method = "cv", number = 3, returnResamp="all", savePredictions="final")

giniIndex3 <- train(salary_in_usd ~ ., data = balanced_dataset, method = "rpart",trControl = ctrl)

prp(giniIndex3$finalModel, box.palette = "Reds", tweak = 1.2, varlen = 20)


```

```{r}

caret::confusionMatrix(giniIndex3$pred$obs,giniIndex3$pred$pred)

```

### Gain ratio

The gain ratio, a normalized measure of information gain, is calculated by dividing information gain by the split information. The attribute that yields the highest gain ratio is chosen as the splitting attribute. The C4.5 algorithm employs the gain ratio.

The J48 is the Java-based open-source implementation of the C4.5 algorithm, and it is included in the Weka package. This implementation allows users to conveniently apply the C4.5 decision tree.

#### 10 Folds

```{r , fig.height=70, fig.width=90}
set.seed(10)
ctrl <- trainControl(method = "cv", number = 10, returnResamp="all", savePredictions="final")
gainRatio10 <- train(salary_in_usd ~ ., data = balanced_dataset, method = "J48",trControl = ctrl)
plot(gainRatio10$finalModel)
```

```{r}
gainRatio10cm = caret::confusionMatrix(gainRatio10$pred$obs, gainRatio10$pred$pred)

gainRatio10cm


```

#### 5 Folds

```{r , fig.height=70, fig.width=90}
set.seed(10)
ctrl <- trainControl(method = "cv", number = 5, returnResamp="all", savePredictions="final")
gainRatio5 <- train(salary_in_usd ~ ., data = balanced_dataset, method = "J48",trControl = ctrl)
plot(gainRatio5$finalModel)
```

```{r}

gainRatio5cm=caret::confusionMatrix(gainRatio5$pred$obs, gainRatio5$pred$pred)

gainRatio5cm

```

#### 3 Folds

```{r, fig.height=70, fig.width=90}
set.seed(10)
ctrl <- trainControl(method = "cv", number = 3, returnResamp="all", savePredictions="final")
gainRatio3 <- train(salary_in_usd ~ ., data = balanced_dataset, method = "J48",trControl = ctrl)
plot(gainRatio3$finalModel)
```

```{r}
gainRatio3cm=caret::confusionMatrix(gainRatio3$pred$obs, gainRatio3$pred$pred)

gainRatio3cm

```

### analasys of the gain ratio classification

all 3 trees seem to have the same structure that is

the attribute that was first selected at the node is the experience level, it has divided the tree into : right subtree : SE(Senior level) EX(Executive level) left subtree : EN(Entry-level) MI(Mid level)

Each of these subtrees further refines the classification based on the attribute "employee residence." However, there are different criteria for splitting in the right and left subtrees:

In the Right Subtree:

The split is based on whether the tuple has the value "Latin America & Caribbean." In the Left Subtree:

If the experience level is 1, the tree further partitions based on whether the tuple has the value "North America." If the experience level is 2, the split is based on "employee residence" being "Latin America & Caribbean."

The decision tree continues to select the most appropriate attributes for splitting at each node, progressively refining the decision process until it reaches the leaves, where final class labels are assigned to the instances.

```{r}
rbind("10 Folds"=macro(gainRatio10cm), "5 Folds"=macro(gainRatio5cm), "3 Folds"=macro(gainRatio3cm)  ) 
```

Based on the evaluation metrics of average Sensitivity,Precision ,Specificity, and Accuracy, it is evident that the gain ratio model, built using a 10-fold cross-validation approach, exhibits superior performance compared to the other two models. However, it's worth noting that the difference in performance between the models is relatively small. Notably, as the number of folds decreases, a corresponding decline in the model's performance becomes apparent.

### Information gain

Information Gain is a metric used to decide which attribute to choose for splitting the data at each node in the decision tree. For a given dataset, the Information Gain of an attribute is calculated by comparing the entropy before and after the dataset is split based on that attribute. The attribute with the highest Information Gain is chosen as the splitting attribute.

#### 10 Folds

```{r}
set.seed(10)
ctrl <- trainControl(method = "cv", number = 10, returnResamp="all", savePredictions="final")


infoGain10 <- train(salary_in_usd ~ ., data = balanced_dataset, method = "C5.0",trControl = ctrl)

c5model <- C5.0(salary_in_usd ~ .,
                       data = data_balanced,
                       trials = infoGain10$bestTune$trials, 
                       rules = FALSE,
                       control = C5.0Control(winnow = infoGain10$bestTune$winnow))
plot(c5model)
```

```{r}
caret::confusionMatrix(infoGain10$pred$obs, infoGain10$pred$pred)

```

#### 5 Folds

```{r}
set.seed(10)
ctrl <- trainControl(method = "cv", number = 5, returnResamp="all", savePredictions="final")


infoGain5 <- train(salary_in_usd ~ ., data = balanced_dataset, method = "C5.0",trControl = ctrl)

c5model <- C5.0(salary_in_usd ~ .,
                       data = data_balanced,
                       trials = infoGain5$bestTune$trials, 
                       rules = FALSE,
                       control = C5.0Control(winnow = infoGain5$bestTune$winnow))
plot(c5model)
```

```{r}
caret::confusionMatrix(infoGain5$pred$obs, infoGain5$pred$pred)

```

#### 3 Folds

```{r}
set.seed(10)
ctrl <- trainControl(method = "cv", number = 3, returnResamp="all", savePredictions="final")


infoGain3 <- train(salary_in_usd ~ ., data = balanced_dataset, method = "C5.0",trControl = ctrl)

c5model <- C5.0(salary_in_usd ~ .,
                       data = data_balanced,
                       trials = infoGain3$bestTune$trials, 
                       rules = FALSE,
                       control = C5.0Control(winnow = infoGain3$bestTune$winnow))
plot(c5model)
```

```{r}
caret::confusionMatrix(infoGain3$pred$obs, infoGain3$pred$pred)

```

## Clustering

### 1- prepreocessing

we encoding the rest of factor columns to transformed into numeric types before clustering

```{r}

# view data

dataset3 <- dataset
View(dataset3)

# Reserve the salary_in_usd (the class label) column in an attribute before removing it from the dataset for clustering

classLabel <- dataset3[, 5] 


# Remove the class lable from the dataset

dataset3 <- dataset3[, -5]

# encoding job_title variable

dataset3$job_title = factor(dataset3$job_title, levels=c("Analyst", "Architect", "Engineer", "Leadership", "Consultant/Specialist","Cyber Security","Others" ), labels=c(4,1,2,5,3,6,7))

# encoding salary_currency variable

dataset3$salary_currency = factor(dataset3$salary_currency, levels=c("USD","BRL","GBP","EUR","INR","CAD","CHF","DKK","SGD","AUD","SEK","MXN","ILS","PLN","NOK","IDR","NZD","HUF","ZAR","TWD","RUB"), labels=c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21))

# encoding employee_residence variable

dataset3$employee_residence = factor(dataset3$employee_residence, levels=c("North America","Latin America & Caribbean","Sub-Saharan Africa", "Europe & Central Asia","East Asia & Pacific","South Asia","Middle East & North Africa"), labels=c(1,2,3,4,5,6,7))

# encoding company_location variable

dataset3$company_location = factor(dataset3$company_location, levels=c("North America","Latin America & Caribbean","Sub-Saharan Africa", "Europe & Central Asia","East Asia & Pacific","South Asia","Middle East & North Africa", "AQ", "UM"), labels=c(1,2,3,4,5,6,7,8,9))


# Data types should be transformed into numeric types before clustering. 

dataset3$experience_level <- as.numeric(as.character(dataset3$experience_level))

dataset3$job_title <- as.numeric(as.character(dataset3$job_title))

dataset3$salary_currency <- as.numeric(as.character(dataset3$salary_currency))

dataset3$employee_residence <- as.numeric(as.character(dataset3$employee_residence))

dataset3$company_location <- as.numeric(as.character(dataset3$company_location))

dataset3$company_size <- as.numeric(as.character(dataset3$company_size))

# viwe the class of variables to ensure it's transformed to numeric
sapply(dataset3, class)

dataset3 <- scale(dataset3)

```


### 2- K-means


#### k-means clustering 1 

```{r}
# run k-means clustering to find 2 clusters
set.seed(8953)
kmeans.result <- kmeans(dataset3, 2)

# print the clusterng result
kmeans.result

# visualize clustering
fviz_cluster(kmeans.result, data = dataset3)


#average silhouette for each clusters
avg_sil <- silhouette(kmeans.result$cluster,dist(dataset3)) 
fviz_silhouette(avg_sil)

#Within-cluster sum of squares wss 
wss <- kmeans.result$tot.withinss
print(wss)

#BCubed
cluster_assignments <- c(kmeans.result$cluster)

ground_truth_labels <- c(classLabel)

data <- data.frame(cluster = cluster_assignments, label = ground_truth_labels)

# Function to calculate BCubed precision and recall
calculate_bcubed_metrics <- function(data) {
  n <- nrow(data)
  precision_sum <- 0
  recall_sum <- 0

  for (i in 1:n) {
    cluster <- data$cluster[i]
    label <- data$label[i]
    
# Count the number of items from the same category within the same cluster
same_category_same_cluster <- sum(data$label[data$cluster == cluster] == label)
    
# Count the total number of items in the same cluster
total_same_cluster <- sum(data$cluster == cluster)
    
# Count the total number of items with the same category
total_same_category <- sum(data$label == label)
    
# Calculate precision and recall for the current item and add them to the sums
precision_sum <- precision_sum + same_category_same_cluster /total_same_cluster
recall_sum <- recall_sum + same_category_same_cluster / total_same_category
  }

  # Calculate average precision and recall
  precision <- precision_sum / n
  recall <- recall_sum / n

  return(list(precision = precision, recall = recall))
}

# Calculate BCubed precision and recall
metrics <- calculate_bcubed_metrics(data)

# Extract precision and recall from the metrics
precision <- metrics$precision
recall <- metrics$recall

# Print the results
cat("BCubed Precision:", precision, "\n")
cat("BCubed Recall:", recall, "\n")
```



#### k-means clustering 2 

```{r}
# run k-means clustering to find 3 clusters
set.seed(8953)
kmeans.result <- kmeans(dataset3, 3)

# print the clusterng result
kmeans.result

# visualize clustering
fviz_cluster(kmeans.result, data = dataset3)

#average silhouette for each clusters
avg_sil <- silhouette(kmeans.result$cluster,dist(dataset3)) 
fviz_silhouette(avg_sil)

#Within-cluster sum of squares wss 
wss <- kmeans.result$tot.withinss
print(wss)

#BCubed
cluster_assignments <- c(kmeans.result$cluster)

ground_truth_labels <- c(classLabel)

data <- data.frame(cluster = cluster_assignments, label = ground_truth_labels)

# Function to calculate BCubed precision and recall
calculate_bcubed_metrics <- function(data) {
  n <- nrow(data)
  precision_sum <- 0
  recall_sum <- 0

  for (i in 1:n) {
    cluster <- data$cluster[i]
    label <- data$label[i]
    
# Count the number of items from the same category within the same cluster
same_category_same_cluster <- sum(data$label[data$cluster == cluster] == label)
    
# Count the total number of items in the same cluster
total_same_cluster <- sum(data$cluster == cluster)
    
# Count the total number of items with the same category
total_same_category <- sum(data$label == label)
    
# Calculate precision and recall for the current item and add them to the sums
precision_sum <- precision_sum + same_category_same_cluster /total_same_cluster
recall_sum <- recall_sum + same_category_same_cluster / total_same_category
  }

  # Calculate average precision and recall
  precision <- precision_sum / n
  recall <- recall_sum / n

  return(list(precision = precision, recall = recall))
}

# Calculate BCubed precision and recall
metrics <- calculate_bcubed_metrics(data)

# Extract precision and recall from the metrics
precision <- metrics$precision
recall <- metrics$recall

# Print the results
cat("BCubed Precision:", precision, "\n")
cat("BCubed Recall:", recall, "\n")
```


#### k-means clustering 3 

```{r}
# run k-means clustering to find 4 clusters
set.seed(8953)
kmeans.result <- kmeans(dataset3, 4)

# print the clusterng result
kmeans.result

# visualize clustering
fviz_cluster(kmeans.result, data = dataset3)

#average silhouette for each clusters
avg_sil <- silhouette(kmeans.result$cluster,dist(dataset3)) 
fviz_silhouette(avg_sil)

#Within-cluster sum of squares wss 
wss <- kmeans.result$tot.withinss
print(wss)

#BCubed
cluster_assignments <- c(kmeans.result$cluster)

ground_truth_labels <- c(classLabel)

data <- data.frame(cluster = cluster_assignments, label = ground_truth_labels)

# Function to calculate BCubed precision and recall
calculate_bcubed_metrics <- function(data) {
  n <- nrow(data)
  precision_sum <- 0
  recall_sum <- 0

  for (i in 1:n) {
    cluster <- data$cluster[i]
    label <- data$label[i]
    
# Count the number of items from the same category within the same cluster
same_category_same_cluster <- sum(data$label[data$cluster == cluster] == label)
    
# Count the total number of items in the same cluster
total_same_cluster <- sum(data$cluster == cluster)
    
# Count the total number of items with the same category
total_same_category <- sum(data$label == label)
    
# Calculate precision and recall for the current item and add them to the sums
precision_sum <- precision_sum + same_category_same_cluster /total_same_cluster
recall_sum <- recall_sum + same_category_same_cluster / total_same_category
  }

  # Calculate average precision and recall
  precision <- precision_sum / n
  recall <- recall_sum / n

  return(list(precision = precision, recall = recall))
}

# Calculate BCubed precision and recall
metrics <- calculate_bcubed_metrics(data)

# Extract precision and recall from the metrics
precision <- metrics$precision
recall <- metrics$recall

# Print the results
cat("BCubed Precision:", precision, "\n")
cat("BCubed Recall:", recall, "\n")
```



### 3- Silhouette method

```{r}

fviz_nbclust(dataset3, kmeans, method = "silhouette")+
  labs(subtitle = "Silhouette method")

```



### 4- Elbow method

```{r}

fviz_nbclust(dataset3, kmeans, method = "wss") +
  geom_vline(xintercept = 4, linetype = 2)+
  labs(subtitle = "Elbow method")

```



